 It's entitled Cracking Nuts of Phone and Other VoiceOverIP Technologies. One difference about this particular slide is that you have, so if you need to contact me or email me after this presentation, ask any questions, please use teamor at netwitness.com. Today, basically, I'm going to show to you that even though we are going to extremely digital these days, I mean you probably have voiceover IP in your home. The one thing you need to know is you're not safe from wiretap just because it's the latest and greatest technology, it's all digital. And one thing I'm going to show you today is actually you can capture over the network and you can also capture, reconstruct, reassemble and replay voiceover IP. We're going to try to, I'm going to show you a good equation for do line up. We're going to try to do it anyway. I couldn't get the audio to exactly work. I don't have a direct feed but I'm going to try anyway to hold the mic up to the laptop and we'll see if that works or not. Out of this presentation is if you're at home, you have voiceover IP or maybe you're at work, you might want to be scared if you see the unmarked white van parked outside because they're still going to be able to run. This is Nuts of Phone. When you dial out, this is from a packet capture. When you hit the dial button on the Nuts of Phone application. This is the packet that gets sent out. If you look in the red, part Martin red, that's actually the encoded phone number. So it's not passed in clear text and it's actually very easy to decode the phone number. It took me a few hours but after doing a few operations, the phone number eventually popped out. And I'm going to go ahead and present the equation for that. Unfortunately, Net-to-Phone got to me before I could present this. So I was on the phone yesterday with Vice President of Architecture at Net-to-Phone and he informed me that this is proprietary information and therefore protected by law. So we're not going to be able to show you this extremely simple operation to expose the phone number. Moving on. Oops. I just had a PowerPoint malfunction. I'm sorry. So we're going to go right into how to capture and replay voiceover IP. Most voiceover IP products do use what's called RTP. That's the real-time protocol. It's very simple. You're looking at the header. This is what's at the beginning of every RTP packet. And there are two things that we are interested in. The first one being the payload type. The payload type is going to tell you what audio codec was used and that's what you're going to use to uncompress and replay the audio. Now most of them are pretty standard these days. I've seen on a lot of different products outside of Net-to-Phone. They use the common ones like ULaw or ADPCM or G.723 and these have all been defined by the standard. They're all static mappings based on this payload type. There are a dynamic range within the payload type that is only available to the vendor. The vendor can use whatever they want to use. In that case you would just have to try various codecs or go contact the vendor and be able to get their uncompression algorithm and apply it to it. But usually they use, most of the products use some of the standard ones that are available to you. The other thing that's important is the timestamp. The timestamp is important because in voice over IP communications you do not transmit silence. In order to actually do a replay of the conversation from packet capture, timestamp becomes important because we need to add the silence back in. There are five steps here to do this and this can be done from any packet capture maybe using TCP dump or something like that. My demo today was done using wind dump since this is a Windows machine and just capturing packets from a net to phone conversation. These five steps are actually fairly generic in that they should work for any product that is not encrypting their RTP traffic. I've tested it on a lot of different applications as well as hardware devices and it works fairly well. The five steps are we need to reassemble the application streams, we need to decompress the audio, we need to fill in any silence gaps, adjust the starting time and then we mix and we can play it back through our speakers. The original source being a packet capture, we can replay that and hear the conversation unfold. We're going to go into each five steps in detail. The first one is reassembling the application stream and since RTP is UDP based, we're going to reassemble the UDP streams based on the IP port-IP port combination. I want to basically create two link lists. I want to capture all the packets that are going to the server and then likewise I want all the packets coming back from the server so I have both sides of the conversation there and I can manipulate it so we're going to end up with two link lists, two streams and we're going to work from there. Then the second step, once we have those link lists, we're going to decompress, we're going to go through each packet in that list, we're going to look at that payload type to see what codec was used and then we're going to decompress it. Now in this example the payload type was four, so four means g.723 so we have called that codec to do the decompression and now our link lists, our two link lists are filled with uncompressed audio and we can move on from there. Now if you're a Windows developer they actually make it rather easy. You can just use the ACM API, the audio codec, the codec manager and you'll be able to query any codecs that are installed on your computer and be able to do this. All right, the next step is we need to fill in those silence gaps since voiceover IP doesn't transmit the silence. Now the timestamp is a little kind of vendor specific because the RFC doesn't really tell you anything other than this is for the vendor to use as a timestamp. It could be in any time interval, in my example I'm just choosing milliseconds but it's up to the vendor to choose any time interval they want. The first packet can start at any arbitrary number so that doesn't really tell you anything either. So really the only information you can get out of this is the differences between each packet, the differences in time between each packet and that's important. Then the first thing you need to do is you cycle through that link list and you find the minimum number, the minimum time interval and we're going to use that value. If we find it in this example you see 300 milliseconds was the minimum time interval between packets and then if you look forward at the first packet, between the first and second packet we notice that 600 was there. So what we have there is a missing packet. We either missed it or there was silence and it wasn't transmitted. So based on that we know we need to add in one packet of silence. Likewise between the second and third packet there you see we have a time interval increase of 900 whatever 900 is and that tells you since 300 is the minimum between packets and it's always a multiple by the way. 300 is the minimum that means we need to insert two more. So the end result in this example is we're adding in those packets to put in the silence. Otherwise if you were to play back this stream all you would hear is non-stop talking. There would never be a break. So that's important to do especially when you're going to do what we're doing and mixing both streams together so that you hear the full conversation. We're not done yet. Even though we have padded everything we got all the silence gaps filled in we need to adjust the late sending stream. What I mean by that is if you do a capture of RTP typically you're going to see one side transmitting before you see the other side coming back to you. So there is a time lag between those two streams and we need to adjust for it otherwise when we do the mix of both streams to hear the conversation you're going to find everything's out of sync. The streams would be in sync individually played but when you try to mix them together you're going to find that people are talking over each other and you're not going to be able to understand the conversation very well. So what we do in this example you can see we have our stream one or stream two and we notice that the first packet we capture was seen two seconds after the first stream. So what we need to do at that point is to fill in put in a little silence at the beginning of the late starting stream which is stream two since we noticed the first packet was two seconds after. So we compare those and we figure out how much silence we need to add and we go ahead and we add that to the second stream. So where we're at right now we've just got our reassembly of all our packets we decompressed all the audio we filled in the silence gaps in between all the packets in our link list and now we've just adjusted the starting time of one of the streams so that both streams should now theoretically if I played them both at the same time you should be able to hear the conversation unfold fairly well and you should be able to hear someone talking and then someone responding and it should mix together very well and that's exactly what this final step is we're going to just mix it. Mixing two uncompressed audio streams together is a rather simple operation in fact you're just going to be adding either the 8-bit or 16-bit quantities together to produce one single stream that will play back either through your speakers or you could save it off to a file, save it off as a WAV file, it's just uncompressed audio at this point and you could play that back. So that's the entire algorithm how law enforcement could be sitting outside your house capturing your internet traffic, your voiceover IP traffic and then replaying it back later when they haul you off to court for whatever you guys did, I'm not asking. But now I've got a demo of the actual equation I actually worked for on a product called NetWitness and I'm going to show you basically how this was implemented and you can hear hopefully for yourself how it sounds again we didn't get the direct link so it's questionable Now the first thing I want to show you is not the RTP stream but I'm going to show you the Net to Phone dial-up connection this is when you click the dial button and right here you know this is the block that I was pointing out before this is just a heck stump of the first two packets we see and that's actually where the encoded phone number was and you can see up here that's the actual digits that I dialed so the equation I can't tell you about, it does work please don't call that phone number, it's some Jamaican guy I just made it up, probably wouldn't be happy if everyone called him Alright now for the good stuff hopefully this is the RTP stream, this was step one this is where we had to reassemble the packets you can see the two IP addresses involved here and the two port numbers and basically we have about 2,000 UDP packets that are in our link list that I was describing and now we're going to go ahead and perform the other operations and replay the audio, so let's hope it works Oh and you would net the phone right now Oh, sounds like a regular phone call to me Yeah, I wanted to see if you're going to DEF CON this year It's in Vegas, isn't it? Of course I'm going Well guess what, I'm presenting this year and I'm going to demo how to replay What conversation are you going to use to replay? You're capturing this conversation right now? There you go, how'd that sound? Thanks guys, that's all my time I'll be outside if you want to ask me any question Thanks again