 open source stuff, worked on GCCGDB for 14 years, lots and lots of GNU projects for about the last 20 years. My current project is Gnash, which I'll get into. But anyway, I was basically going to do a talk on reverse engineering proprietary network protocols. More focusing on the tools and techniques. This is more a series of slides with me rambling in between, so if you have questions, ask them at the time. I don't think you have to wait till the whole end. Basically, I started a nonprofit last year to work on a lot of these issues that I'll be getting into, but we call ourselves open media now. We're officially based in Colorado. Uh-oh. But we're going to, this thing's going to change slides on me again. But basically, we have people spread out all over the planet. We do a lot of work on multimedia software along as, all right, I'm going to have to fix open office. Anybody know how to make it stop advancing automatically? All right. Basically, so we also do a lot of fundraising for a bunch of software projects that we maintain as well as doing a lot of legal work around multimedia codec and stuff. And we're pretty big on educational software. This is going to drive me crazy. Um, as much as I hate to say it, I'm going to try and fix this very quick manually. Amazing. It survived. So one of our initial projects that most people have heard of is a Ganesh, which is a GNU Flash Player client. And for a lot of people, if you're on a 64-bit system and you're not running an Intel architecture, flash support is rather difficult. So anyway, a bunch of years ago, we wrote Ganesh, which is, and some of this talk is about some of the work we've done in working on Ganesh. Unlike a lot of free software projects, Ganesh is almost an entirely reverse engineered project. Until recently, none of the Adobe specifications were open. So we've actually spent about five years figuring everything out the hard way, but the fun way. Ganesh runs standalone as well as a browser plug-in for Firefox, Conqueror, Mozilla, Ice Weasel, and a bunch of different stuff. Standalone-wise, Ganesh is actually popular for user interfaces for embedded devices written in flash, as opposed to just running it for YouTube stuff. We do a lot of work with streaming video, which is sort of why some of this engineering I've had to do lately. We use OpenGL for hardware accelerated graphics, AGG, Cairo. We have a lot of work on security, a lot of different stuff like that. We also support using Agvorbus and Theora streaming for like Wikipedia and Internet Archive, instead of just using all that proprietary stuff. We also have another project that I started last year that is kind of what this talk was about, is the work in that project called Signal. I recently cloned the Adobe Media Server, since I had a Flash Virtual Machine, and a lot more companies like the BBC were shifting to using proprietary network protocols. I thought it would be fun to write my own. And so, unlike some of the other media servers, we have a Flash Virtual Machine built in. We support patent-free codecs as well as free one. And although it won't be fully functional, and it's out for release coming out in a couple of weeks, it's pretty much working towards full flash-based video conferencing groupware type of applications. The last Ganesh talk. And Ganesh actually, we do up to version 9 and version 10 these days, although most people never notice it because all the distributions ship old versions. So if you ever have Ganesh problems, use a newer version and stuff. And we've got a lot of the ActionScript implemented, and we started work on ActionScript 3, which is like a gigantic class path library like Java was. A little bit more stuff on Ganesh. Ganesh is written in C++, which is sort of unusual for GNU projects, but a lot of things actually translate pretty well to C++. Not everything, but in this case it works pretty well. We also use the GNU Autotools, which drives people crazy endlessly, but it works pretty good for us. We use Boost a lot. Boost is about to become the new. A lot of the Boost classes are being added to the standard template library for C++. So we've been using Boost. We support Gstreamer or FFMpeg. We support a bunch of different networking protocols, including the one I'm about to talk about. And we support a lot of different GUIs because we figure people should use their desktop and their GUI toolkits that they're used to. So we support everything from GTK, KDE, QT, raw frame buffers, Aqua on the Macintosh. Somebody's even doing Windows support. I don't know why. I guess Windows users need a 64-bit flash plugin. We're also really big on performance, which is important because we do a lot of supportive embedded devices and things. OpenMoco, Nokia Internet Table, it's a lot of that kind of stuff. We've done a lot of work lately with the XVideo extension for full-scoring, high-resolution video and a lot of stuff. We're doing, I think, at last testing, 1900 by 1200 full-screen video at 10% CPU load, which is really good for a lot of the lower-end hardware. I mean, just because Adobe wrote the Flash Player doesn't mean we can't do a better job. So basically this talk is about RTMP. RTMP is Adobe's proprietary networking protocol that's used by the BBC iPlayer right now and also used for flash-based applications. So if you've ever done professional video conferencing and groupware software, Spreed is an open-source company in Germany that uses this, they're almost all written in flash. You can write a video conferencing application in flash and action script in literally 15 or 20 lines of code. The problem was that we wanted support for this protocol of which there was no documentation available on the Internet, and it was kind of important. So RTMP is actually used by a whole bunch of different things, but there's a couple of main functions that Adobe uses it for that we needed. It actually supports taking an action script object like a date class, whether it's in JavaScript or whatever, and encoding it in the binary, transmitting it to the server, it may transmit it to other clients, and you're actually transmitting actual objects around sort of at that object encoded level. And that was kind of the whole purpose of RTMP, is that it labels you to more or less transmit flash programs between multiple applications. The other nice thing is on the server side is it lets you actually do real seeking with video, so we're adding support for a thing called AUG-Z chopping, which is basically server side editing of video, things like that you can do with your own server. There's a whole lot of different RTMP protocols that Adobe's developed. They're all basically the same, with just minor variations, mostly some encapsulation, different encryption techniques, whether UDP versus TCP-IP, and all that kind of stuff, but the protocol itself hasn't changed that much. So basically, people say, you know, why does it really matter? Well, RTMP was originally invented to deal with network congestion. I mean, most people here remember, you know, dial-up modem days and stuff, and so the problem with network congestion was you needed sort of a special protocol, so to speak, to actually handle when you've got congestion, and then the Adobe side, it'll drop frames and do all these other really cool things. And the other point is we actually wanted... Huh? Oh, I thought you had a question already. And the other point is that we really wanted to use this ability to transmit encoded action script objects between different instances of Gnash, the media server, and back. And that pretty much was a big enough motivation to go through which was about six months of reverse engineering. So basically, I guess a lot of people say, you know, why reverse engineer? Why do you need to do anything? Well, that's not really a good option if you're a free software fanatic like myself. But in this case, it's like, you know, reverse engineering is sometimes the only thing you can do to get access to stuff that you normally wouldn't. And sometimes that access is important. I mean, the first time I showed somebody that I could do a YouTube video on their Sony PlayStation on a PowerPC, they were blown away. So it's kind of nice to give people access to do things that they normally couldn't. You know, we've tried really hard not to wind up like the FFMPG project where they've done some really nice reverse engineering, but because of all the legal concerns, nobody will actually ship it, which doesn't do much fun either. And some of us actually like reverse engineering. It's a bit like, you know, playing 3D chess and Klingon or something. It's just kind of fun once you get the hang of it. So the number one thing when you're basically about to launch off on spending months and months and months on the Hex Dumps is that if you stare at the Hex Dumps long enough, it actually does make sense. It's just basically keeping your sanity until you actually get to the point where it makes sense. So a couple of generic tricks I find. I spent a lot of time packet sniffing, which is kind of, you know, how you get your raw material. One of the things I found is use really large sizes when you're capturing packets. A lot of times if you have a big enough size, you'll get most of the entire packet in your, you know, in your sniffing tools and stuff. And that makes a lot easier to figure out where things are like packet size fields. For instance, if you don't have the whole packet, you can't tell which fields the packet size because you don't get the right byte count. Another thing is that a lot of protocols have a lot of ASCII strings embedded. I know RTMP does. And so a lot of times just finding all the ASCII strings and getting them out of the way first actually makes all the other stuff look a lot less overwhelmingly complex. Some examples of that, too. The other thing is you have to expect to write and throw away a lot of code. When reverse engineering, you're basically just guessing. And a lot of times you're just not really sure if you've guessed right. So I'm a big believer in writing a lot of little standalone utilities to do things. And you have to expect to throw it away. You can't assume you're going to get it right the first time, the second time, the third time, or the fourth time even sometimes. But if you really want to accomplish what you're doing, you just have to deal with it, write the code, get it wrong, throw it out, and start on something else. I also find that a really relaxed, low-distracted work environment is important. I mean, I get to work at home, so I have it pretty easy. But I don't know, I get a lot of work done. I make a pot of tea, turn up the stereo really loud, start staring at Hex Dumped. It's a great way to spend a rainy afternoon. And the other part is, as I mentioned before, if you stare at the Hex long enough, you really do get to see the patterns. I mean, I've spent time in the past with the National Intelligence Projects, and people are so much better at looking for patterns in things. You just can't often write enough code to analyze things automatically, although you do a lot of that too. And so a lot of times you stare at, you know, Gigabytes and Gigabytes of Hex Dumps, and you start to go, wait a minute, every iteration I see this thing, I wonder if that's important, and sometimes it is. No, it wasn't, R-T-M-P. And the other thing is it's really hard to get started. The first couple of days I find that you're about two days into it, and you go, oh my God, what am I doing? I mean, this is going to eat my lunch for six months, which is at best six months. There's a lot of different common functionality. I basically have to say most protocols because there's always things to break the exceptions. I've actually reverse engineered probably, I guess, two dozen network protocols in the last 30 years. And so there's a couple of basic things I like to look for, mostly just to get me started and figure out what's going on. So this will probably sound obvious, but most protocols have a header. Not always, though, but most of them often do, and sometimes things that look like they don't have a header, have a header like maybe the first packet, and then after that it's like streaming video, so to speak, and it's just raw data. So first you have to really find the header. That's kind of the big deal. Most headers that I've ever worked on have a length field for fields. I mean, it's really simple. For ASCII strings it'll say, you know, 12, you have 12 characters and stuff like that. And so that's usually the other first thing you start looking for, is you start looking for the counters. A lot of header protocols have another field that's the size of the packet, for instance. That's why I like using big packet sizes, because if I've got a big packet size, I can write a quick program to calculate everything and give me a quick check sum, and I can then compare that with some of the stuff that I think might be the header. And I might say, oh, this packet's got 300 characters, and oh, look, there's a 300 here. You know, before the data, that must be the length field. Because a lot of this is, you have to eliminate the stuff that you can find easy to spend time on the really harder stuff. Most protocols have a check sum. RTMP is not one of them. They don't care, which is probably why it's junk. And the other thing is that a lot of network protocols, all the numbers are in big Indian format. You know, network byte order. And although Adobe breaks this, it turns out with RTMP, some of the fields they do little Indian instead of just big Indian, but most of the time when you're looking at numbers, you don't recognize them because you're looking at, you know, a double in big Indian. So you get to learn that, you know, 0, 3, 0, 3 is a 1. So what I actually do for that is I print out a big chart of all the numbers, one to 100 basically, in big Indian format, so I can go, oh, yeah, that's what that is. The other part is how to stay legal. Reverse engineering specifically in my country, unfortunately, is highly discouraged and borderline legal. So the easy thing to avoid is don't disassemble executables. I mean, that's tacky anyway and it's cheating and it takes all the fun out of it. I mean, I know when we started on Gnash, we basically, I mean, you know, it's a Flash player. It has op codes, it has function calls, it has a stack, who needs a reverse engineer, the Adobe's player and get it all wrong. The other big point that's also important is that if you use proprietary software, you must have legal copies. This is very interesting because for RTMP, the license forbids me from actually working on an implementation if I've ever installed any Adobe software. So thank God I have college-age kids who run the Adobe player. But that's actually important, is if you actually do all this work, you want to be able to redistribute it and if you can't redistribute your reverse engineering work, at that point it's sort of like why bother. My other advice is talk to real lawyers. I talked to so many people that think they're legally minded, especially when it comes to issues like DMCA violations and stuff and talk to a real lawyer. They tell you how lawyers think and engineers can't read legal documents and it makes sense because none of that stuff is sensible. Well, I don't know about you guys, but if I read a patent application, I see it very literally and lawyers see it the way lawyers do it, which is actually how you're going to have to deal with it, not how we see it. A common thing that we haven't had to do with the Ganesh project, but another way is that a lot of typical clean room is one person writes a specification and the idea being that whoever wrote the spec is doing a lot of strange things, but typically then that by having a separation from the people actually implementing it, you're just doing a work-alike product instead of cloning the project and a lot of various projects have done this in the past. I mean some of the original Phoenix Bios stuff, they put everybody in a hotel room for six months to write specs and then they gave it to developers that actually write. And then I guess the last thing is don't live in the United States. Unfortunately I do, so I'm extra careful, but I live in a good part of it. There's a lot of different capture tools. I think everybody has different things that they prefer. I'm sort of a dinosaur, so I like TCP dump. TCP dump works really good for me. It's got a lot of different formats and things like that, so I actually probably use that as my primary tool. NGREP is another one I got turned on to last year and it kind of greps a network connection. And although it's very similar to TCP dump with a slightly better user interface for a command line tool, it sometimes is very useful. I've also found that having multiple tools and how they display things differently helps me see the patterns that make this. Sometimes things look better in octal than hex. Sometimes the way different sniffing tools display data, you go, oh that's what that's really doing. And so I kind of suggest using multiple tools. The other fun one is Wireshark. In the case of RTMP, Wireshark had no RTMP support. It does now, but they had to wait until we figured it out before they could add support for it. But it's actually really good for sniffing even a protocol that you don't understand. It's really nice that you can pick things by port number and destination source and a lot of other stuff. And so I actually use Wireshark a lot for just capturing raw binary dumps. It's pretty useful. And now that they have RTMP support, we basically get to check each other's work that they've been using. I spent six months figuring it out and they had it added to Wireshark in a week. It was crazy. Other display tools, I use OD. I bet most people here have never even heard of OD utility. Okay, well whatever works. But OD is actually also nice because it lets me, as I say, display data in a lot of different ways. And sometimes I like having things in a terminal window. GDB, believe it or not, GDB is very useful when you're writing your example that you're trying to implement how you think some protocol thing has worked. And a lot of times in GDB, you'll jump into it and you'll say, what would happen if this number is an offset and you can do simple pointer arithmetic and display the data in a buffer that might be a network packet. And I find GDB actually pretty useful. Of course, I like GDB. Most people don't, but that's another story. Endgrap again and of course TCP dump. Both of these do a lot of different output. I don't find myself editing Hex that much, but sometimes you do. Sometimes when you think you know what something's doing, you want to change it. Like if you have a byte count for a string, you might say, well, I'm going to add a character and then change the length and see if I broke anything. So when I actually have to edit Hex, I use G Hex too. It's probably one of the only GUIs I actually use all the time. And it's pretty useful for that. Emacs, I believe or not, I use Emacs for editing binary files a lot. You can do a lot of things in Emacs. Funny enough, even edit software, I've been told. But it's pretty good for binary dumps and other things. Vim, you can use Vim. It's a little more painful. And there's so many different editing tools for Hex. BVI, BEVE, CRISP. There's a whole bunch of them out there. I think it's pretty easy to write an editor. So basically, G Hex too, this is more or less what it looks like. I tried to use the same packet here. This is actually an encoded object basically. And it says hello world, of course, and it's a string. And G Hex is kind of nice because you can get all your offsets and do a little bit, switch the encoding. Emacs binary mode actually works pretty well. I use this pretty often, actually. I don't know, it's kind of useful. G Vim, it doesn't work quite as well, but if you're a big VI fanatic, you can actually edit binary files with VI instead of having to learn some other new tool. I find this the least useful, but some people are actually talking about it. So I think it's pretty useful. It's the least useful, but some people are actually telling me they really like this mode. Which one? Oh, okay. Yeah, I mean, sometimes more tools is always good. I actually did this one a lot, silly editor tricks. A lot of times I'll take a hex dump, like from G Hex for instance, display it, put it into a scratch buffer of Emacs, or in this case, open office so I could do color. And a lot of times just put carriage returns between where all the fields end. It's a lot more sensible. I mean, if this echo hello world thing was kind of how that slide was, it's a little hard to figure out what it does if you don't have the text on the side. And so here I've got it all broken down into the different fields for all the strings, and you can sort of see how there's byte counts and other things like that in it. And I did this a lot. I spent a lot of time just hacking it up things. So TCP dumps got a huge wealth of options. These are the main ones that I use, of course, which device, the port, DashX, which mixes ASCII and Hex at the same time is very useful because rather than looking for all the Hex stuff later, you can sort of see it in the dump and go, ah, that's what that is. And instead, DashX for Mac size, big packet size, it's always good. More or less, this is what TCP dump looks like. Pretty boring. This is an HTTP message just so there'd be more text to show things that were interesting. And you can see that if you're just looking at the Hex, this looks like a mess, but with the ASCII you can quickly go, oh, I know what this is. This is just a fancy get request from my web browser, basically. NGREP has actually got very similar options to TCP dump. They change like the device options, silly things like that, but a lot of the other options are the same. And NGREP looks more like this. A lot of the outputs cleaned up. It's a little bit easier to read sometimes than TCP dump and stuff. And this is actually a decoded RTMP packet right here. You can see here, because it has a lot of text in it, this is the result message from connecting the network connection to the server side. OD has got a bunch of different options, maybe not as much as TCP dump or NGREP. Dash A for the address, I actually pass none all the time to get rid of addressing because half the time I don't care. And then you can change a bunch of different size things and their widths and stuff. OD is kind of still useful, as much as the other tools. Here's a couple of varieties of, you know, different OD type output. I think my main use of OD is actually for doing test cases. I can take, for instance, that last line there and cut it and paste it into a test case and it's a lot easier than trying to pull it out in Ghex or whatever. And of course, Wireshark, if you really like the gooey world and stuff, I'm kind of a command line guy, but Wireshark is actually a pretty nice little tool. I've been using it more and more lately and kind of gotten used to it. One quick example of GDB, basically, you know, as I said, once you're, you know, you've got a packet, you've read from the network in GDB and in memory, you can do all sorts of different things to it. In this case, just dumping it as raw hex. You can dump the characters out. You can do an amazing amount of stuff. I mean, GDB is like Emacs. You can pretty much do everything but, you know, walk the dog and get you a beer from the refrigerator. And I did this an awful lot too with my test code, the sort of experimenting with things, changing values and things, stuff like that. So basically, this is what an initial RTMP header would look like. This talk's not going to get into the details of RTMP as much, although I can answer some questions on it. But this is kind of the first thing I had to figure out was the header of all the data. And in the RTMP world, it turned out to be pretty complicated because they're really in the bit fields. I mean, bit fields and headers are kind of evil, but it made it a lot more complicated to figure out what was going on. So for instance, this first one here is, the first two bits are the size of the header. They have 4-byte, 8-byte, and 12-byte headers. And so the first two bits determine what size the rest of the header of. This was a pain to figure out. And then followed by what's called a channel number. RTMP supports 64 channels in a single network connection. So like if you're doing video conferencing and you had four windows open, each of those would be a separate channel and all the messages are multiplexed in partial transmission. But this is also a pain to figure out. Once I figured it out, it was easy, but the first bit's always really hard. All messages in RTMP have a timestamp. Basically, the timestamp they use for keeping audio and video synchronized, because your audio and video packets are showing up differently and interweave together. And then on the Flash Player side, you have to then look at the timestamps when you figure out how to display things and they keep it synchronized. They also use this timestamp in a lot of different ways with RTMP. That's how they detect network congestion and the timestamps start changing and there's a lot too much drift in between them then they realize that there's congestion and they drop frames or whatever else. Then there's also, in this case, is the size of the message. As I had said before, most headers actually have a size of the whole packet. And in this case, they use a short or an integer often for this. So that's just the size of the whole packet. So when you sniff the big number, that was pretty easy to find. It's like, oh, I wrote a program and it dropped everything and it just going backwards until it found the number that was the size of the bytes that it just read. Pretty simple but effective. Also, RTMP supports about 17 different data types. Each one a particular type of action script object. And so in this case, this is what's called a 14, which is a remote invocation call. So basically you can actually load executable code on the media server and then the remote calls on all the methods of that object, which is the whole point of RTMP. There's another field they call source, but it's kind of funny. It just says whether the message has come from the server or whether it's come from the client. Got me why they even bothered to waste all those bytes, but they did. So here's a little bit more on the RTMP headers. They also have this thing they call a continuation packet. Most of the packets, it turns out, in RTMP were limited to 128 bytes. But often, you know, a video or some object is way more than 128 bytes. So they have this thing where they'll send you a 12 byte header and then all the following messages to complete the size of the data packet are actually one byte headers using the same channel number and the different bit field for the one byte header. This is a real pain in the neck because suddenly the header size changed and you've got multiple channels of data in the same hex dumps. It was pretty brutal. AMF you'll see a lot of. AMF is sort of the encoded action script objects. An action script object is like a date class and you can print out, you know, the date formatted in 20 different ways or actually in Flash they have a video class and then there's a microphone and a camera class and things like that. And so AMF action script is based on ECMAScript 262, which is the same specification as JavaScript is based on. It's just that Adobe didn't really catch up with the spec until about six months ago. It only took them 15 years. There's a new version. It turns out not to be very compact. For instance, they've got a channel number which can be one to 64, but they use a double that eats up eight bytes to transmit channel numbers, which is kind of stupid. So in AMF three, they decided to try and improve things because Flash has had big performance problems and Adobe Flash version nine tried to fix a lot of these problems. So they actually did things like added an integer type and a lot of stuff like that. I haven't actually seen anybody using AMF three in the sort of internet in the wild, but it'll probably happen more and more. So pretty much an AMF packet, you know, more or less looks like this. This is a pretty simple one. It's got the tight field in blue on the one side, then the length field, and then the actual string. A lot of protocols actually don't do null terminators, which most C programmers are used to. So it's something to think about that you actually have to read the byte count and you just can't read till you hit null. I'm big in testing. Testing is I think the number one way in almost any software project. And in this case, you really need to because you have no way of knowing if you're encoding or decoding correctly. So I do a lot. I mean, I've probably got 4,000 test cases for RTMP stuff. And most of those test cases are the only way I could get any work done, especially because I can't run the Adobe Flash player. So often my test cases are the only ways where I can continue munging through encoding and decoding stuff in a controlled legal manner. You can do a lot with disk-based files, like that, but whatever works. Testing is really important. And I often start with the test cases and the test suite a lot before really getting into the real implementation of the code. We're really big in Deja Canoe, which I actually helped write. So this is one of our test cases, for instance, in decoding numbers. And this is kind of how I like to do it. I used OD to get the binary hex there, cut and pasted it into a memory buffer, and then I can run my parsing function, and then I can walk through the data structure returned from parsing the hex and examine all the different elements of it to make sure that it was parsed correctly. The nice thing about this is that I have thousands of these, one for several for every different data type I could ever support, nested data types, and arrays of custom objects and tons of stuff. And when I make a little change, I can run the test suite and see if I broke something really majorly, really quickly, which is really useful. With that good testing, I would make a subtle change and sometimes never know for days or weeks if I've actually broken anything. The other thing is, depending which tools you're using for sniffing, it's hard to know where your data actually starts. Once again, this is an HTTP message to be easier and stuff, but it's more or less the first 52 bytes is all the Ethernet, TCPIP, header over crap. And so NGREP leaves all that off, which is kind of nice, but pretty much you have to get to where the actual data starts or you're trying to decode an Ethernet header as like a video packet header so pretty much here's a classic example of a Hexdom that I'd grabbed. You know, you start this thing and you go, you've got to be kidding, there's nothing in here. I mean, I look at this and I can see the number one and begin the format reverse and a bunch of other stuff. So the first thing I usually do is I look for the ASCII strings. In this case, just color coding them. But basically, once you decode all the ASCII strings, three quarters of the packet is gone. So what's left is actually not as complicated and the nice thing is that what's left is not exactly straightforward but in this particular packet it wasn't that complicated. So then after finding that, then I go find everything else, which in this case is probably mostly just numbers instead of more complicated objects. In this, I went for the type fields in blue. Zero two is the AMF data type for an action script string. Zero five is a null object, which is just kind of a placeholder type object. You know, like saying data could live here, but I don't know why they do it, but they do. And then you can see where I've got the length fields marked out in green. Numbers, of course, don't have a length field. There's basically the numbers. The data type field is a zero for a number, which gets confusing, but you get used to looking for nine byte numbers and stuff. The three FF zero, as I said, is a one in big Indian format sitting on a little Indian machine. And then basically, you know, loading it into Emacs and adding nice colors and stuff, this is what that packet turns into. See, that wasn't that hard, was it? That's what it, you know, looks like it now anyway. And that's the other reason I like to use Emacs a lot is, you know, when I originally did that, I put that hex dump in a scratch buffer, added carriage returns, and broke it all down, and it made more sense that way. So basically, people say, how can you help? You know, along with reverse engineering and stuff, we have a lot of software that's actually using the work that we're actually using. So, you know, translations to other languages for our Ganesh players are really good. People that can do compatibility testing that actually run the Adobe thing is very useful. Bug reporting, documentation, maintaining a build farm, funding is always really nice. Every once in a while, people send me free beer, which is kind of funny. And then I just want to give a couple of thanks to the people that sponsored all this work that I've been doing for the last two years here, basically. And without these people, I wouldn't be able to put all this time in the reverse engineering, any of this stuff. So questions, or more explanations. Funny enough, the day after I did a version of this talk a few weeks ago, they announced that they were going to release the specs, but the licensing agreement forbids you from using a specification to write your own implementation. Actually, having already written an implementation, I'm not really sure what that means. But once I get back to the states, I'm going to have to ask the EFF about that in more detail. I look at the same way when ... Basically, the question was, Adobe announced that they're going to release their specifications for RTMP. And basically, no. I mean, I already published my version of their docs like two years ago. And I think it's a lot like when they released their ActionScript specifications last year. We already figured that out. It's basically too little, too late. I mean, I guess it's good that Adobe's trying to be more open source and doing the right thing and all this other kind of stuff, but at the same time is their licensing clauses are hell. They have the same thing for the ActionScript specifications. Until recently, if you read the spec, you couldn't write a Flash Player legally. I think that's illegal in EU countries, but in the US, unfortunately, it's binding. His question was, how do I deal with encryption in protocols? And unfortunately, there's this thing called the DMCA that actually forbids me from answering your question. Typically, I mean, reverse engineering encryption protocols is sort of a subspecialty. I happen to have a couple of friends who that's their thing, and I think they use very different techniques than doing network protocols. But typically, in the RTMP case, they actually have a new encryption scheme they've just added called RTMPTE, which uses some 128-bit key. And one of these days, I'm actually thinking of tearing into that. But I have to keep talking to the lawyers before I start, since I do like to go back to the States once in a while. He's not illegal. Doing reverse engineering is there. Positioning whatever you have to do, will be there as they come back. Basically, the question was, what about going to a country that doesn't have these stupid laws like we're stuck with doing all the work there and then publishing it? I spend enough time over here that I almost feel like that's what I'm doing. But the problem is that when I go back to the United States, I am required to abide by those laws. We've seen in some of the past DMCA cases where large companies have attracted people from Russia, Norway, and a few other places to the United States and then drop the DMCA violation on them. So while that's not a bad idea, along with doing the work in the right country, I think it would be to find a developer in that country that actually lives there to actually do a lot of that work. And I think this is actually one of the reasons that MASH has so many European volunteers because most everybody in the United States have signed the Adobe license agreement and installed the Adobe plug-in and actually then can't work on any of our software. And a lot of the folks in Europe and Italy and EU countries and stuff, they don't really care about this thing because it's not against the law there. So I think that's a good idea. For me personally, it's just not real practical because I do like to go back once in a while. Yeah, the question was how do I figure out checks on algorithms? A lot of checks on algorithms, it seems like most protocol writers use the simple way of just adding up every byte. And the trick is that sometimes, depending on the size of the checksum field that'll roll over, but most everybody literally they're just adding up the bytes and I'll know there's other ways of doing it but that's the one I've seen probably 90% of the time. And then yeah, that's where you start writing test code. You think you know what the checksum is so you'll write something that'll add up all the numbers and print it out in one byte, two byte, three byte and four byte quantities of checksum for a number that's this big and it rolled over once kind of thing. More than that, it can be kind of complicated but I've been lucky, I haven't really seen any other checksums used and not in the real world. I'm sorry, I can't... He basically asked how do we make sure that our volunteers are doing good sort of clean room reverse engineering. One of the ways we actually have to do this is that nobody checks in code to our source code repository unless it's gone through the core developers. Typically we even have to ask questions of some patches and things to make sure that their source came from something clean or they figured it out. We got a lot of patches that are just bug fixes. That stuff's easy, we just put it in. But most of the reverse engineering stuff, to be honest, I don't think we actually ever really had any contributions from everybody. I've done most of the hardcore reverse engineering and all of our other contributions are actually on all the other parts of the project because more than reverse engineering, a network protocol, there's a little bit more to the software than just a protocol and so a lot of our contributions are GUI layers and graphic morphing and different renderers and a lot of that kind of stuff. But I think if people got in the reverse engineering, yeah, I'd have to ask where they found out because if they had ever read the specs or read an internal Adobe document, I can't use it even if it works. Okay. Yeah, I think basically the question was, is it against the law to sort of say, oh, we'd like this protocol to be reverse engineered and hope that somebody does it and contributes it from sources unknown. I think the problem now is you'd actually have to know where it came from. Although I think the idea of, like, oh, well, somebody reverse engineered this, I see that actually all the time. I don't know where it came from. I don't know where it came from. I don't know where it came from. If somebody reverse engineered this, I see that actually all the time. I mean, people offer projects to me like that pretty frequently in things. But as far as getting to somebody to sort of donate anonymously a reverse engineered implementation, you'd really need to know where it came from. And typically a lot of reverse engineering anyway is not exactly a short-term project. You almost need funding to do it because you want to do it all day every day until it actually starts making sense. Some of this stuff's really hard to do because there's probably a lot of protocols that I would love to see reverse engineered. I've been mostly focusing on the Adobe ones. I wouldn't mind somebody, you didn't hear this from me, working on the encryption part of RTMP. But you didn't hear that. Basically the question was using WikiLeaks in a sense for leaking technical documentation where the legalities and the licensing around it. That actually happened with Flash. It didn't go through WikiLeaks, but a lot of people just took copies of the ActionScript specification and dropped it on the Internet. Other people copied it from them. So typically, yeah, if you've actually gotten access to it legally and it's gone through several hands of people, most of the lawyers I've talked to consider that legal. At the same time as when I actually found copies of the specification, we had already figured that part out because we just used the ECMO 262 spec initially. But I think that is a good way that if people have knowledge of stuff that's not illegal to release at all, WikiLeaks might be a good place for it as anything else. Yeah, he asked about doing reverse engineering of drivers. I've actually done a lot of device drivers reverse engineered. It's even more painful than doing network protocols because you typically need to know things about the hardware. A classic example is I was sitting down with a friend last week in an appendix of a reference of the USB spec. There's one field that these guys found that actually turns off the USB control and solved all the problems they were having. And this is a matter of sometimes finding that took somebody like a couple of months of reading documentation. But yeah, drivers are hard. Same question. As I said, this talk was mostly on network protocols. I've done mostly network protocols and device drivers. So that's kind of my background. I think file formats gets really complicated. I mean, I look at stuff like OOXML with a 7000 page specification which can't even be implemented and things like that. And I think that the problem with a lot of file formats is that network protocols usually by design are pretty simple and what you have to figure out is kind of more limited. And I think file formats, people obviously do it. I mean, ODF works and OpenOffice reads Word documents and things like that. It uses some of the same tools but probably slightly different techniques because I don't think they worry about things like header fields and, you know, byte counts of stuff and checksums as much like, for instance, in a word processing format. At the same time as, you know, Hex is Hex and you can still analyze it and figure out what you could. Yeah, that's another one. Adobe Patent at RTMP. My implementation came out before the patent was granted and so I actually have to talk to a real lawyer pretty soon and find out if I can actually release the patent that I just wrote. You got to love software patents, barely. And they've also, well, and then they've announced that they're going to do their spec so, I don't know, we'll see what happens. I actually contacted Adobe about that and they claim they're becoming a more open source friendly company but their license agreement still sucks so I'm still waiting. So you said, how long do I spend reverse engineering versus... How big a part of the project is reverse engineering? Oh, how much time do I spend doing the legal? I'm having a hard time hearing so. So how much time do I spend dealing with the legal side of things? Typically, the legal side is usually visiting lawyers for an afternoon here and there and talking about stuff in great detail and then typically not actually necessarily having to talk to them again. Once I've had people explain to me, for instance, I've memorized all the legal reverse engineering clauses in the DMCA, for instance, and had lawyers interpret it for me. Once they've taught me in a sense how to reverse engineer legal documents and stuff like that, I don't usually have to see them that often except for occasional questions like about this issue, about RTMP. Plus, good lawyers are expensive. Luckily, mine work for PRE, the EFF is great that way.