 All right, we can go ahead and get started. Can everybody hear me? No problems? All right. This talk is Real-Time Stegonography with RTP. I'm Druid, founder of the Computer Academic Underground. HD and I co-founded the Austin Hackers Association, and I'm currently employed by Tipping Point DV Labs doing VoIP security research. Quick overview of what I'm going to be talking about. Real briefly, I'm going to talk about what VoIP is, what RTP is, what it's used for, a little bit about audio steganography, and then also briefly some previous research in the area. Next, I'm going to talk about using steganography with Real-Time Transport, the problems and challenges that I encountered while attempting to develop this project. After that, I'm going to speak about my implementation, which is called Stegan RTP. Basically what a little bit about it, the goals that I set forth, the architecture, operational flow, the different methods, structures that I'm using in the protocol, the different functional subsystems of the tool and what they're used for, and then which of the challenges that I identified in the previous section I was able to meet, which I didn't. After that, I'm going to attempt a live demo, and then speak a little bit about conclusions and what I intend to do with this project moving forward. And then after that, Q&A will be in one of the Q&A rooms. So VoIP and RTP, again real briefly, this is one slide. If you don't know what VoiceOver IP is, it's basically internet telephony. It's what a lot of the traditional telephony networks are moving towards. RTP is the real-time transport protocol. Other than IAX, almost all other VoIP protocols utilize this for the audio channel of the call. There's a number of different signaling protocols used. There's SIP, H323, IAX, some others, but almost all of them use RTP for the audio. So a little bit about audio steganography, just so you have some idea of how this actually works. Basically, steganography comes from the Greek root word steganosingrafian, which literally means covered writing. The primary goal of steganography is to hide the fact that covert communication is taking place. And with modern methods of steganography, it's essentially hiding one piece of data inside another piece of data. So some terms I'm going to be using. When I say the word message, I'm essentially talking about the data that's going to be hidden or extracted. Cover medium is the medium in which data is going to be hidden. It's also sometimes referred to as the cover image, cover audio, whatever the type of cover medium is. Stego medium is a medium in which data is already hidden. And redundant bits are essentially bits of data in the cover medium that you can manipulate and modify that won't compromise the medium's perceptible integrity. Specifically with audio, things you can change to where the human ear will still hear the same audio, but it's digitally different. Two different types of covert channels are the primary types of steganography used. They're storage-based, which is essentially data that's embedded into a static cover medium, like an image or an MP3 file or something like that. And you extract it from that static piece of data. Essentially, that covert channel is persistent. You can embed the data five years ago and extract it now, and the data will still be there. You also have timing-based covert channels, which are essentially signals that are sent by modulating the behavior of either of the environment that the system is running in or of the tool itself. And that type of communication is transient. A good example of this is producing a load that changes the CPU response time. So if you have a very quick CPU response time, you might interpret that as 0, and if it's above a certain threshold, you might interpret that as 1. By modulating your application's behavior, you can signal that data to an observing process or application. Digitally embedding a message in a cover medium generally involves two basic steps. You're going to identify the redundant bits of the cover medium, and then you're going to decide which of those you want to use and manipulate them to embed your data. Usually, you can get away with using the least significant bit of cover medium's word value size. Sometimes, depending on the type of cover medium, you can get away with using two or even three of the least significant bits. But in most digital mediums, I've used steganography with one is plenty for what you're trying to do. Media formats in general tend to be very inaccurate data formats because they don't need to be accurate. The human ear is not very good at differentiating sounds. For example, if you were to record an orchestra performance with two different recording devices in the same manner, you're going to end up with two completely different recordings when looked at digitally. But when you play them back, they're going to sound fairly identical. Changes in an audio bit stream can be done so slightly that when played back, the human ear won't be able to tell the difference. And also, with audio, like I said, you can usually use the least significant bit from each byte or each depending on the size of the word value to embed your message. Here's a quick example of doing audio embedding in an 8-bit audio file. So let's assume that we have these 8 bytes in a cover audio file. If we want to, and there's a binary, I've highlighted the least significant bits for you. If we wanted to hide the byte 2.14, we replaced the least significant bit from each byte to hide our message. And here's a comparison of the original plus the modified. If you'll notice, only about half of those values actually changed. So the impact of making these manipulations can be very, very small. Some previous research in the area, steganography using audio as a cover medium is nothing at all new. There's a number of STG tools that will operate on different static files, like MP3s, waves, VOCs, AUs, et cetera. With VoIP steganography, there have been some previous research efforts. A lot of their uses of steganography or steganographic techniques aren't really used for steganography. For example, there's one project that uses the redundant bits in the audio to widen the actual audio band. So it's using a steganographic technique for an overt manner. There's another one that replaces RTCP, which is the real-time Control Protocol. And then there's also one that does watermarking of audio for integrity checking and things like that. Some of the deficiencies of those research efforts are, like I mentioned, they don't really achieve steganography with using these techniques. They're used for other purposes. Also, some of them are just theory papers. They don't really explain what they're doing or how they intend to accomplish what they're talking about. And I've only found one that actually had a public proof of concept. And that came out a week and a half ago. I'm not going to talk too much more about this because I actually have an analysis paper I'm going to be publishing in about a month or so over all of the different projects that I found prior to doing this research. So moving forward, I'm going to talk about utilizing steganographic techniques with an active communications channel like RTCP. Real quick, some context terminology. Because I'm talking about steganography as well as networking, both of the disciplines use certain terms for different meanings. So just to be clear, when I say a packet, I'm talking about it in the network sense. It's a network data packet. In this case, it's normally going to refer to an RTP packet. When I say message, I'm using that in the steganographic term. It's the data that's going to be embedded or extracted. One thing that might be a little confusing is the protocol I've designed to run in the steganographic channel uses formatted data messages which look like packets. So just to be clear, when I say messages, that's what I'm talking about, even though it might look like a packet. And hopefully, I'll be able to use these terms consistently so as not to confuse anyone. As I said earlier, in steganography, you generally have two general types of application. You have storage and timing. Almost all existing tools and uses of steganography that I've found regarding audio or VoIP use a storage type of method. And that's the same type that I'm going to be using. They implement separate hide and retrieve modes. And the closest things I've found to a real time use of steganography is the project that came out about two weeks ago, which is called VO squared IP. It's basically embedding a secondary audio channel encoded in a different codec inside an overt audio channel running in RTP. So it's similar to what I'm doing, but instead of embedding data communications protocol like I'm doing, they're actually embedding different audio. And there's a couple of deficiencies in their research, which I'm going to be including in my analysis paper. So basically what RTP provides is a streaming media mechanism for VoIP to transport the audio of a call. And it provides the opportunity to establish a real time covert communications channel within that. So RTP packet payloads are essentially just encoded multimedia. During my research, I focused on RGB audio only, but you can also embed video. You can embed text. RTP can basically transport any type of real time data. The frequency, locations, and number of redundant bits are determined by whichever codec is used to encode that data. And codecs like G711, which is what I used for my research, uses a one byte sample encoding. And like I mentioned earlier, that's generally pretty resilient to least significant bit changes to the audio. Codecs with larger samples might provide for one or more bits, but I just stuck to one with my testing. And that's my cat. All right, so audio codec word sizes. I focused on G711 because it's fairly straightforward. It uses an 8-bit sample word size. There's a number of other codecs. I only listed a few here because I found these other ones interesting. Speaks actually has a dynamic variable word size. It changes its sample size on the fly. So that would present an interesting challenge to a steganographic and better. And ILBC actually uses a class-based bit distribution. What it does is it takes all of the audio samples. It classifies the bits in each sample, and then it rips all those bits apart and groups them by class, and then embeds that in RTP. So in that case, you may have all of the least significant bits all at one end of the RTP packet payload. So that would require its own type of custom embedder as well. Some throughput I was able to achieve for G711. Most of my testing, I had a 160-byte RTP payload using an 8-bit sample word size and utilizing just the least significant bit from each sample word. That's basically eight samples needed to embed one byte of data. At 50 or so packets in either direction, you get about 1k per second. So the problems and challenges I ran into trying to use steganography with RTP. Probably the most significant one is that RTP utilizes UDP for its transport. UDP is connectionless and unreliable. Due to the transport, the message data that's being split across these multiple packets might arrive out of order, or some of them might not arrive at all. So next one was cover medium size limitations. As I showed you a minute ago with how much data could actually be embedded using fairly small streaming type packets, there's not a whole lot of room to work with when you're only using a single bit out of each sample. So when I use the RTP payload for a steganographic purpose, the data is inevitably going to be split across multiple packets. So we need to have some mechanism for reassembling them on the receiving system. Next is latency. RTP is extremely sensitive to network latency and quality of service issues. When we're taking these packets in and we may need to manipulate them or not before sending them on, we can't hold on to them for too long or we're going to start affecting the covert nature of the communications. It's going to be audibly noticeable in the user experience if we're messing with the RTP stream too much. So another thing about RTP is that it essentially sets up two packet streams for a call, one in either direction. And when you start adding conferencing and things like that, you get more. But between two parties, you basically have one RTP stream going in one direction and one going in the other. The challenge is there is that you need to be able to correlate those so that you can hook into the correct streams and make sure your communication is going to the correct places. Another challenge is compressed audio. These codecs can be compressed in the payload. So to successfully embed into a compressed payload, you would essentially have to uncompress, add your information, recompress, and then repackage that and send it on, or skip those packets entirely if there's only certain ones that are being compressed. You also have to deal with media gateway audio modifications. As RTP traverses the network, it may cross one or more of these intermediary devices. What they can do is re-encode the data in a different codec. They can change the sizes. They can do any number of things to the RTP stream, which might compromise the integrity of your second graphic channel. So basically, identification of any of these has to be considered. And you have to be able to find a way to overcome that if those types of devices are being traversed. Also, an interesting thing about RTP is that endpoint devices can change audio codecs on the fly. They may be operating a G711 for the first five minutes of the call, and then switch to something else as their network latency becomes higher and it needs a more efficient codec. So you have to be able to track that and change your encoding and embedding methods on the fly. So now I'm going to talk about my particular implementation. And this is essentially a reference implementation. And by calling it that, I only have to worry about conveying the functionality. I don't really need to secure my code because I write horribly insecure code. So about second RTP, first it's the most awesome school name I've ever come up with. And that's usually my major motivation for working on any particular project. This implementation basically requires that the application be either on the endpoint device or an active man in the middle. However, I demonstrated at USEC West that you can do passive injection or RTP audio. In this case, it would be exactly the same if you wanted to do a passive type injection. But I haven't implemented that in my particular tool. Right now, it runs on Linux. It uses a Windows Curses interface. And for receiving, it only has to be able to observe the inbound RTP stream. It doesn't actually have to be a man in the middle or anything like that, as it does for the outbound RTP stream. So you compare it with ART poisoning, ART spoof, something like that to achieve that type of architecture. My first goal is I obviously wanted to achieve steganography, hide the fact that there's a covert communication taking place. I wanted to have a full-duplex communications channel so that I could be sending data in both directions at the same time. I wanted to compensate for UDP's unreliableness. I wanted transparent operation, whether it's acting as an active man in the middle or operating on the endpoint. And I wanted simultaneous transfer of multiple types of data. I didn't want to just send a secondary audio stream. I wanted to be able to send chat data, file data, remote shell, et cetera. So here's a quick look at the architecture. The phones are basically your soft phone or your client application. The gray boxes are host systems. The green boxes are the application. And as you can see, there's the two RTP streams going in either direction for the outbound or sending towards the more remote endpoint. The application needs to bridge that stream. And then it needs to be able to observe it coming in the other direction. Here's the man in the middle. It looks almost exactly the same, except that instead of the application running on the endpoint devices or the endpoint hosts, they're out there somewhere along the path that RTP takes. So here's a quick look at the process flow. Basically, the tool initializes. It identifies an RTP session based on some constraints that you give it on the command line. It hooks the packets for those RTP streams. It reads packets. It determines whether that's an inbound or an outbound packet. For inbound, it immediately sends it because we don't need to modify it or do anything else to it. We just need to cache it and look at it after we go ahead and send it along. We extract any potential message data, decrypt that potential message data, and then check an identification check sum. That tells us whether it actually is or is not message data. If it is, we send it to a message handler. For outbound, we check and see if there's any data that we have waiting to go out. If there is, we read it. We create a new stag message for it. We encrypt it. We embed it into the RTP packet, and then we send it on. If there wasn't any data waiting to go out, we just pass the packet unmodified. So identifying an RTP session, I basically use LibFindRTP, which is the previous project of mine. It looks, given some constraints, watches the network for RTP sessions, and then it identifies that based on the signaling. So one of its deficiencies is it has to see the beginning of the call in order to work. If you have an active RTP session going, it won't actually identify that. However, you can pass all of that information on the command line to the tool if you already know there's an active RTP stream, and then it'll hook it directly. And it supports SIP and skinny signaling protocols for identification. So hooking packets, I'm basically using the Linux net filter hook points. You basically just add an IP tables rule with the target of Q. What that does is it passes the packets to a user space queuing agent. It's basically just an API for reading and manipulating packets and then telling net filter whether to drop them or pass them or anything you would normally do with IP tables. Quick look at the net filter hook points. The two we're interested in are called pre-routing and post-routing. You'll see that on the top left and top right. Essentially the pre-routing hook point gets packets before the local system has done anything with them, and the post-routing gets all the packets after the local system is completely finished with them. And that's what I'm interested in for mine because I wanna try to maintain the integrity of the Stego channel. So hooking packets. Basically the tool registers itself as a user space queuing agent for net filter. And then it creates those tools that I mentioned. And then it's able to read packets from the queue, modify them if I need to, place them back into the queue and then tell the queue to accept the packet for forwarding. Inbound packets, like I said, we can immediately accept them because we don't need to modify them. That hopefully helps with some of the latency issues with RTP, it just immediately sends the packet on. That basically allows for very low impact on the latency. With the copy of the packet, we extract the message, we decrypt the message, we verify if it actually is a message or is just an RTP packet that didn't have anything embedded into it. And then we send the message if it is valid to a message handler. For outbound packets, we pull for data that's waiting to go out from the tool. If there isn't any, we immediately send the packet. We create a new message based on the properties of the RTP packet we're going to be embedding into. We read as much of the waiting data as will fit in the message that's going to be embedded. We encrypt it, and then we embed the message into the RTP payload as cover medium, and then we send it. So some session timeout stuff. If there's no RTP packets seen for the timeout period, it bales and it starts looking for a new session to hook into. The inbound message handler basically receives all valid incoming messages as determined by the main packet system. When it receives a control message, it handles any internal state changes or administrative tasks that might be required. I mean, you see, I've got a couple in there like echo request, echo reply, a replay of a missing message, closing files, opening files, things like that. And then it also receives incoming user chat data, receives incoming file data, and receives incoming shell data. So a quick look at packets and messages. I'm not going to go into all of these in detail because it's pretty dry, but here's what an RTP packet looks like. I've highlighted a couple of the fields that we're going to be interested in later. PT is the payload type that basically tells you what audio codec the payload's encoded with. There's a sequence number, there's a timestamp, and then the RTP payload, obviously is what we're interested in. The message format I'm using for my communications protocol, which runs in the Stegonographic channel, is basically just a checksum ID field. It's 32 bits, a 16-bit sequence number, and then a TLV structure. I tried to keep this header information as short as possible because of the space constraints while simultaneously trying to pack as much functionality into it as possible. Message header fields, like I said, we have a 32-bit ID. That's basically what we're going to use to identify whether the message is a valid Steg message or not, and that is a 32-bit hash of a key hash, which is from user input, and the sequence type and length, which is the remainder of the header. The sequence number is your standard incrementing sequence number. The type tells us what type of Steg message is going to be, and the length is the remaining size after the header. Some message types. There's a number of control messages, those echo response, echo replies, echo requests, opening files, closed files, those are all control messages. We also have chat data, file data, and shell input data and output data. Control messages can basically be stacked if they will all fit into a single RTP message, or RTP packet, and it's essentially another TLV structure. It's control type, length, and then value. Control types, like I said, we have echo request, echo reply, resending of any type of missing messages. Start file in file for file transfers. Echo request is pretty straightforward. It's a sequence and a payload. Same thing for reply. Resend, we just tell it what message we want recent. Start file, we give it an arbitrary file ID and then the file name. Close file just passes the file ID. Chat data, you can see there's no extra header information that's determined by the primary Steg messages type field. For file data, it's almost the same except we include the file ID, and for shell data, it's determined by the Steg messages header. So, functional subsystems. Basically, we have the encryption system. It's not really encryption. Right now I'm just using pseudo-crypt with an XOR against a bitpad. I use XOR because it's lightweight. I can implement it quickly. And it uses a SHA-1 hash as its bitpad, which is a hash of password that's supplied to the tool by the user. The XOR operation is begun at an offset into the hash, which is determined by the RTP header fields that I highlighted earlier. Basically, it's a 32-bit hash against the user-supplied key hash, the RTP sequence number, the RTP timestamp, and we modulate that by 20 to get an offset into the 20-byte pad. The embedding system. Right now, this only operates on G711 codec. I'm using the commonly significant embedding method. The properties of the RTP packet tell you what your total available size is for embedding based on the codec and things like that. It's essentially the payload size divided by the word size times eight since I'm using a single bit. I need eight bits to embed one byte of data. And the payload size for the stag message is the total available minus the message header length, which earlier was shown in the diagram. The extracting system is basically just a reverse of the embedding function. Pass through the crypto function, a verification of that ID field to make sure that it's a valid message. The outbound data polling system. I basically implemented it as just a big link list of file descriptors that may or may not have data waiting to go out. We can pull those and they're in a particular order for preference. So, raw messages will always go out first if there's one waiting to go out. After that, control messages will go out, chat data, input from the remote shell service, output from the local shell service if you're running one. And then individual file transfer data for any number of particular files you may have going. And then, like I said, they're prioritized in that order. The message caching system. All inbound and outbound messages are cached. We need to do that so that if the remote tool requests a resend, we can look it up in the cache and send it along. Also, if we start receiving out of order messages, we can cache them, wait for the one we're expecting to come in, and then go back to the cache for the subsequent messages. The challenges met in the time I had up to the conference, up to about 8 p.m. last night, was a unreliable transport. Basically, the message's sequence number in the caching system provides request and identification of recent messages, reordering of out of order messages, and it also provides replay protection in case someone's somehow hooking your stag and then replaying things. Cover medium size limitations. Fortunately, there are plenty of RTP packets being sent every second in the average RTP session. As such, even though we're restricted to how much data we can put in each one, we can spread the data out over multiple packets. For user chat, interactive shell access and transfer of small files, throughput, like my example, was perfectly fine for me. Latency, to address the sensitivity of RTP to latency, I made a couple of architectural decisions that I've already mentioned. I designed the packet hooks and data pulling system to pass RTP packets immediately if we don't need to do anything to them. As I mentioned earlier, I also went with XOR rather than real crypto to cut down on the impact of how long I was having to handle these packets. In this case, crypto really only needs to provide a little bit of obfuscation and some entropy to protect against some rudimentary stag analysis of the stream. I'm not actually using it to try to protect the data. My goal was primarily just to achieve staginography, which is hiding the fact that the communication was taking place. Tracking the RTP streams. Luckily, I had already done this with a previous project of mine. I'm using that for identification and then I'm using libIPQ for tracking and hooking the packets, which is the NetFilter user space queuing library. Audio codec switching. To handle those devices that switch codecs mid-session and change things around, I designed the embedding system to operate on individual packets. So the parameters used to perform the embedding are derived from the RTP packet that it's about to target. So if the RTP sessions switching codecs, changing things around, we really don't care. As long as we support the codec that the audio is encoded in, then the embedding system will work on it. Otherwise, it just passes that packet unmodified. Okay, so now I'm gonna attempt to do a live demo. Hopefully it will work. If it crashes all over the place, you can heckle me and laugh. And I need to set up my... Okay. Up here in the top right, I have a soft phone. Actually, I need to show you something else first. Might help if you had some idea of what I was doing. Here's a quick look at the demo scenario. I'm basically going to be demoing both types of operation, the active man in the middle, as well as the application running on the endpoint. And here's what the demo environment looks like. The WinXP host is laptop I'm running the presentation on. It's gonna be running a soft phone, which is the standalone endpoint. The Astra server is gonna act as the other endpoint. It's also gonna have an instance of the tool running on it. And then the Slackware box is the active man in the middle, which will be running the other instance of the tool and will be art poisoning to receive traffic from the other two endpoints. All right, so now I'll switch back. All right, so on the asterisk box, we have asterisk running over here. On this one, we're gonna run the tool with password of password. And then we're gonna be targeting the WinXP box, which is dot one. And the dash S basically says we're gonna run the shell service on this endpoint. And on the man in the bill box, we have our spoof running. It's basically spoofing both of the other devices so that it receives all the traffic. And then we're gonna run almost the same command over here, except we're gonna target the other host system and we're not gonna run the shell service on this side. All right, and then we can get some audio data going in one direction from my mic. And then audio in the other direction will be coming from asterisk. I'm just gonna dial into the asterisk server and let it play the voice prompts for that side of the connection. Hopefully you'll be able to hear this. Although you don't need to really hear the audio too much. So you can see over here on the tool, it basically was watching for the session based on the constraints I gave it on the command line. It found the RTP session, it hooked the packets and now we're ready to go. On this side, it'll basically say local and then whatever you've typed on the remote side, it'll say remote and whatever the remote person typed. So this guy's gonna say, can I have some files? This guy's gonna send big dot text, small dot text. You see up in the top right in the output status, it's sending the files and it gives you a confirmation once those are done. He says, yay. And he's gonna request using the shell. That guy says, okay. So you use issues, the slash shell command which will switch from chat mode into shell mode. You see this is basically interfacing with the shell running on the remote side. Does some commands, does what he's gonna do. Switches back to shell mode. Yay. And that's pretty much it for the demo and it didn't crash, I'm amazed. Basically if this side shuts down, it'll come back over here and after about 45 seconds or so, it'll realize it hasn't received any stag messages, it'll start sending echo requests and trying to confirm that the remote side is still there. I'm not gonna wait for that because I don't have that much time. But that's basically it for the demo. And essentially all of that data that was going between the two clients was being split up, embedded into the RTP audio payloads and then sent across the wire. So some conclusions. Basically I met all of my design goals. I met most of the challenges that I identified. The two that I didn't tackle were the compressed audio and the media gateway interference. I planned to work on both of those. The compressed audio should be pretty straightforward. You just need to create uncompressing and compressing functions. The media gateway one will probably be a little more difficult. To prevent this type of thing, VoIP deployments should be using SRTP which is essentially the secure version of real-time transport protocol. It prevents the man in the middle scenario entirely because the contents of the RTP payloads are encrypted and you can't really modify them in transit. It'll prevent most of the endpoint type scenarios unless the tool that's doing the stagnographic embedding is also part of the soft phone or prior to it actually being put into RTP and encrypted. Future work. I wanna improve my G711 Codex embedding algorithm. Right now I'm basically embedding in every single sample word. By doing some silence and voice detection we can prevent some more rudimentary stag analysis by only embedding into the more random looking data where there's actually speech going on. With G711 in particular, silence is fairly normalized in the encoded data. We're gonna create embedding algorithms for additional audio Codex. Also wanna tackle some video Codex. I plan to replace the encryption system with real crypto rather than XOR. That was kind of just a stopgap proof of concept to get things running. I wanna support larger stag messages by adding some fragmentation features. And I wanna expand the shell access functionality into a services framework so you can provide other things than just shell access across the channel. I should have a white paper going into much more excruciating detail than I did here out sometime in the next month, hopefully before the next uninformed journal goes out. And the source code I uploaded to the Sourceforge project page about two hours ago. You can go get this right now. There's both the CVS access as well as a packaged release. You can grab either one. But in order to build it, you're gonna need Lid Find RTP, which is also on Sourceforge. And I'm going to be taking Q&A in the Q&A room, but I think I have about five minutes before I have to do that. So I'll take a few questions here if anyone has any. Yes. One of the things I'm doing is like I mentioned, if I don't have to do anything to the packet I send it on unmodified, that's almost instantaneous. I haven't seen any latency with those. If you're just doing chat, you might have one or two modified packets every few seconds. Depends on how long it takes you to type in your message and send it along. If you're sending a file, it's modifying things fairly consistently. Using my XOR encryption or pseudo encryption and my current embedding method, I think it was about a millisecond to a millisecond and a half. One of the things I'm gonna have to pay real close attention to when I replace the encryption system with real crypto is how much more latency that creates. If it starts to become more noticeable, Steg analysis is gonna pick up on that real fast. Right, right. Yeah, so far I haven't had any problems with Jenner or any of the RTPs effects dealing with latency. Anyone else? Yes. Before what, Brickstone? Right, I'm not actually doing any type of forward error correction or anything like that. I did look at that in the beginning, but due to the limited amount of space, doing forward error correction would take a lot more room than just requesting the recent packet. Essentially, the way it works now is if I miss a packet coming in, I start caching other incoming packets until I get the one I'm expecting either by sending a recent request or something like that, and then I immediately go back to the cache, which is much, much faster than waiting for them to come in. So far, I haven't actually had the tool get out of sync or anything like that, but most of my testing has been in the lab, so I'd be interested to find out if anyone's actually running this in live environments and if they get poor results or if it works fine. I'm sure my caching system could probably use some improvement, but so far I haven't run into any problems. Yes? It depends on what you're doing. Using XOR, you run into some problems with Steg analysis against things like ASCII. ASCII has certain properties that when you're doing statistical analysis on the payloads, even if you're using XOR to obfuscate it, you'll still start to notice those types of patterns. One of the things I tried to do was create more of a binary protocol than an ASCII protocol to help prevent that type of thing, but depending on which type of encryption you're using, like AES may have a different type of analysis signature than triple DES or some of the other ones. Really, just, I'd have to test. Yeah, what I was just basically thinking was you're talking about real crypto, but you're talking about zeroes and commons. It may be. It also kind of depends on the audio codec. Some audio codecs when it encodes the audio, like I said, with G711, the silence looks fairly normalized. You get a lot of 7F7Es, FFs, FEs. When you start playing with those particular bytes, it may become more noticeable. That, like I said, it also depends on the codec you're using. Yes, I have no intention to do that. Next question. Anyone else? Yes. The man in the middle scenario? Yes. My implementation is pretty bloated, but you could probably write a smaller one that would work on embedded devices. I wouldn't suggest using mine. If you wanted to do something like exploit a system and then load this as shell code, you'd definitely have to use a Metasploit multi-stage loader to even get it anywhere near that box. You could probably do a more slimmed down version if you just wanted chat data or you just wanted to send files. That'd be quite a bit smaller than implementing all the different features that I have. I think I only have about a minute left before I have to move to the Q&A room. Is anyone else have a question? Yes. Yes, that's a good point. I actually am doing that. If I have a message that's smaller than the payload I'm gonna be embedding into, I go ahead and randomize the bit changes for the rest of the message to keep it uniform within that particular message. Doing that for packets, I'm also not embedding anything in, might be something to do. Like I mentioned, I haven't done any Steg analysis against this at all yet because I've been working on implementing it so far. So those types of things I'll definitely be looking at. Anymore? I guess that's it then. Thank you very much. Thank you.