 brand that's a sexy title but uh I know what you're thinking, I don't have a 17 car garage. So what are you doing for me? Alright so where's my noobs at? I'm a noob, where's all my noobs? Do I have any noobs in here? Thank you for your honesty. Uh huh. Yeah this is going to help you get going into the fun stuff so the painful stuff you don't have to worry about it until later. Alright so I'm trying to help out the noobs. Do I have any professionals in any tuners? Anyone get asked to drop a performance engine into a minivan from one of their customers? There we go. He knows the pain. I'm trying to help you get home at night so your wife and kids can see you smiling face. And then also just for the average consumer right? Uh who remembers the Volkswagen scandal? Dieselgate? Anyone remember that? Yeah. This is going to help you actually like grab data off your car and not have to spend a lot of time learning how to look at that data. It'll just computer up like boop, here you go. And you can say, oh I don't really think my car is doing what they sold me. You know? Like I was supposed to get a clean diesel engine and now it's, it doesn't look very clean. So this is what, this code is for okay? You know, I got it. The 17 cars, no one's got 17 cars they're doing back to back to back. But that's what I'm targeting, alright? And uh the other benefit, oh and this too, it doesn't just do cars. Okay? Uh I'll get into it a little bit later but um does your control system, anything where the computer is sending a fixed packet link to another computer and there's no metadata. This code is probably going to help you do your job or whatever you want to do, whatever you want to hack, whatever you want to play with. Okay? So it's not just cars. Alright so disclaimer, I have to do the military. Uh these views are my own. Do not reflect the Department of Defense, the United States Army, the United States Air Force. Or really anyone else other than me. So if I say something stupid, it's on me, it's not on America, right? Alright so this is going to be an oversimplification and again I'm a noob so please do not nuke me after this but let's just categorize all the nets into two flavors. Alright we got vanilla flavor, first your general use network. And your goal here is I want something flexible, right? This is the internet for example. I just want to be able to add and remove hosts. I want to be able to add and remove links. I don't have to worry about really what the net's doing. I just want to get on to it. I don't want to talk to another endpoint. But you sacrifice something in that and that's some guarantees. You have no guarantee your message is going to get there. You have no guarantee when it's going to get there. So I might have to resend. I might have to let them know hey this is, this is my IP. What's your IP? And there's a bunch of metadata that's going along with this to make sure that the message actually gets there and it gets there in a reasonable manner. Gets there all in one piece. So if you have a bunch of metadata of this general use network. And then you have the exact opposite end of the spectrum where I don't want it to be flexible. I want guarantees. When I press my break in my car, I want that break to work. All right? I don't want to have a resend of me pressing my break. I want it to just work. I want it to work on time. So I need a guarantee that's going to get there. I need a guarantee that's going to get there at a particular time. So a control network. But I'm willing to lose the flexibility on the back end, right? So your car, you're not going to add an extra wheel to your car. Maybe, who's going to add an extra wheel to the car? Maybe there's some psycho in here. I don't know. But you're not so much worried about that. You want that guarantee of timeliness and delivery guarantee. And this code is focused specifically on control networks. What you lose, though, is if you want to look at the general use network, you've got all that juicy metadata that you can be like, ah, all right. I've got this tag, that tag. You probably don't even need to look at the actual payload itself. You can just look at just the metadata. But you don't have any of that control network on the car. It's just bits. Here's 64 bits. Here's another 64 bits. There's no, there's not a lot of metadata there. So a lot of people have already done a lot of work way, way smarter than me to help you play with, ah, general, these general use networks. Well, I'm going to, you know, Brent Stone is going to call a general use network. VoIP, the interwebs, smartphones, internet of things. Um, and just, in addition to academia, you know, you've got Wireshark, you've got Snort, you've got all these things that will plug into open source or widely used things. You've got a bunch of academics with 20 pound brains cracking this problem. Uh, in particular of these last four guys, I think you can get this on the Defcon Media Server if you want to nerd out and check out some of this. Um, these last four guys I thought did pretty good work with the general use network, you know, the automated reverse engineering. But I want to play with cars. Or I want to play with medical devices. Or I want to play with planes. Or a train. Or something else. There's nothing really there to help you. Until now, alright? So, um, let's, let's see what we can do. I just want to get a bunch of raw data off my control network. I want to click a button, then boom, it spits out some useful actionable information. And maybe it might be off by a bit or two, but it's going to get me 90% of the way there. Alright, that's, I just want the magic to happen. Um, and that's what this code is going to do, the code on the, the GitHub repo. And I'll talk a little bit more about that in a second. Um, but Brent, what are you doing on this slide? Like, what is all this? This is like an eye chart, right? We call this the military. This is an eye chart. Like, why are you doing this to us? Like, I did not deserve this. Why? Alright, the cool thing is, when you get on the GitHub, don't be overwhelmed. You only need to worry about the pre- processing. Everything's loosely coupled, so you can literally take a different protocol. I've already done this, by the way. You can take a completely different protocol, not change any of the rest of the code, the other 90%, and it will automatically reverse engineer that different protocol for you. So that's why I'm saying it's not just cars, alright? If you can sniff some data off a medical device, and it fits the assumption that every time some little widget on the medical device is talking to another widget, and it always sends two bytes, or 64 bits or something like that, every single time, this code is probably going to work for you. Okay? When, uh, I do a demo a little bit later, I'm actually going to do three different flavors of demo, okay? Because we got, we got set up where I couldn't bring my own laptop, so I'm going to do three different flavors of demo and make sure everyone is fully satisfied with this presentation. Um, the demo is going to be doing these first two steps, alright? Just to show you how quick it can be, and it's going to be a real time thing. If you want to put this on like, an arm chip or some sock or something like that, you can totally do that, and it'll be efficient enough, really efficient actually. Um, and then also, like, how do we actually slice and dice that bad boy? Um, I'm going to specifically show you some slides and like, walk you through the mentality of this, this step of slicing and dicing the data so you can get out the actionable information. Um, and then there's going to be a bunch of, um, more on the github. Again, don't be overwhelmed, you just need to worry about the first part. Alright, so let's walk through, let's, let's walk through the logic, not the code, but the actual logic of, um, how do I slice and dice? So what do I mean by that? Let's say my data is a sentence. This is a sentence exclamation mark. I've got four words, one exclamation point, so I can slice and dice that guy. Alright, that's called lexical analysis to math nerds and stuff like that. And then each little piece of that is a token. And that process is tokenization. Oh, so I want to tokenize my data, I want to figure out where the boundaries are, the logical boundaries, and then I want to pull out each logically distinct piece of the original equipment manufacturer programmed into that chip to transmit over the wire. Alright, I can't just take the big chunk all at once, it's nonsense. And then there's this other process, okay, that was lexical analysis, and then there's semantic analysis saying, let's assume I've been successful like slicing and dicing correctly. Now I want to understand, alright, this one piece, what is this, what am I looking at? I want the computer to help me figure that out, right? If I, that earlier trial where I showed you where they all looked exactly the same, I want that kind of information and the computer can figure that out for me. And this, it does now. So, let's copy and paste over to 64-bit payloads. Let's just assume Widget A on a car is talking to Widget B, like the break is talking to a body control module or something, and sending 64 bits, 64 bits over and over and over again, so many times, maybe it's synchronous, maybe it's not, but it's just sending it over and over again the exact same format, the exact same payload length. I'm watching this happen over time and I'm recording it. You can just take all that, 64 bits and then you can plot it or do whatever you want and it's probably going to be nonsense. Or you can do this slice and dice operation, this lexical analysis and say alright, let's grab bits 0 to 6 and just look at those and interpret those as a distinct piece of information and I'm going to take bits 7 to something else. I'm going to keep slicing and dicing just like I did with a sentence, find the individual words and then when I extract that, this is an actual example from a real car by the way. If I look at bits 16 to 32, I get the vehicle speed and then if I look at bits 32 to 39, I get another copy of the vehicle speed. Actually this isn't vehicle speed, this is the RPM per wheel and as you can imagine there's four of these signals and then there's a little bit of metadata at the front. So this is one example of what this process, what the output of this would look like. So how do I actually do the slicing and dicing? How do I do this automatically? How do I teach a computer to do it? Alright, let's say I'm looking at the observations. So I've got, in this example, I'm looking at 10 bit payloads. Okay, everyone following me? And then my first observation, 0, 1, 1, 1, 0, 0, 0, 0, 0. And then some amount of time goes by and I see that the microcontroller send another 10 bit message, 1, 0, 0, 0, so on and so forth. So what you're seeing is from top to bottom, time kind of elapsing as I'm watching observations and then from left to right I'm watching the bits. Okay? So the way that you can get a very good educated guess on how do I properly slice and dice this bad boys, look for the least significant, most significant bit. No matter what they're doing unless they're encrypting it or they deliberately write some shenanigans to reverse the bits and then put them back together again, you're gonna have a least significant bit and the most significant bit. What happens is if I take a copy of these observations and I shift it by one observation, okay, and then I XOR those two copies, what you get is how many times is the bit changing between 0 and 1? How frequently is it transitioning? Okay? And then if I take the sum of each column and that resulting XOR, then I get a very good indication of where is my least and most significant bits embedded within this larger payload. Alright, so as you can see 1, 2, 4, 7 and then there's a break and then there's another logical piece of information, 1, 3, 7. In this case, you know the right hand three bits was coming from 1 to 7 or something like that. And when you plot it, it becomes even more obvious, especially to us as humans, right? You can see a very definite slope. There's a little hill and then it stops and then there's another hill, like okay, the OEM probably has two pieces of information in these payloads that it's sending. Now I can make a pretty good educated guess. Alright, so how does that process actually work? So those of you familiar with like a hill climbing algorithm or like machine learning folks in here, you probably have heard of that before. You do exactly that. Like I start at the bottom of the hill or the top of the hill, I just keep going until I find that least significant bit that's transitioning a whole bunch of times and then all of a sudden it'll be this sharp drop-off. That's probably the most significant bit to another piece of information embedded in that payload. So I'm just going to cut it right there. I'm just going to take however many bits that is and then I'm going to put that to the side, bin number one. I'm going to start over again, bin number two. You know, bit position 15. Here we go, 16, 17, 18. I got you. And then we're going up to bit 32 and then make another cut but then bin number two. Just those bits. I'm going to look at those later. And what happens is when you see that, again this is a real world example from a car. I didn't give the any heuristics or anything to the algorithm. It really is just that simple and easy in many times, but oftentimes it's that quick, easy to teach a computer to grab these bits and slice and dice it that way. Now I've got the information. But Brent, you said once I slice and dice, how do I know what kind of information that is? All right, that's a true point. It would, one of the folks sitting in this auditorium right now has listened to the full speech. It literally would take me two and a half hours to explain this last part. I don't say that to like show off or anything, just a matter of fact. And I want to put people to sleep. Just go on the GitHub after this if you're interested. If I forgot that hook-in, you know, and check it out. We got a document there. It's very well documented. If you want to know more about how do I actually identify like this is that type of information versus that type of semantic analysis. But I'll just give you one example, Rick, so I'm going to show you a bunch of examples as part of this demo. So I just want you to understand what you're seeing. In this case, you got the super microprint. So if anyone's got like eagle eyes, you can maybe read that. Probably can't. But what I did is I just threw this algorithm at the car. It looked at, I think there is probably 120 IDs floating around the car, all craziness. It grabbed slices from different payloads all over the car and said, hey, that little slice looks like that slice, which looks like that slice, which looks like that slice. And what you get at the end of that is, oh hey, Mr. Hacker, like here's all the things that look, here's all the apples, here's all the oranges, here's all the bananas that I found in this car. I don't know the full name but I can tell you these things all look similar. So you don't have to do that process anymore. Okay, it'll do it for you. And oh by the way, in this particular sample, I was actually able to request J1979 diagnostic information. This is my Prius. I'm conservationist I guess. At least if you ask other people in the military I am, they got like the dually trucks like the four wheels and everything. I asked it, I was like, hey, give me vehicle speed while I'm gathering this data. And it turns out this is exactly vehicle speed for when I was going through this driving sample. So I won't get into the details of like, how can I claim accuracy with this stuff if I don't have any truth data. But what I found is actually over 90 percent accurate most of the time across all 17 vehicles we looked at. So it might be off by bit or two and for those of you that have, you know, work with this stuff, you know being off by a bit is kind of a big deal. But it will definitely get you very, very far along the process where you can get home at night or you can believe that your OEM sold you the features that they thought they sold you or you could be a noob and show off to your friends. Alright, so let's reverse engineer some stuff. So again, I'm going to break this demo into like sort of three different parts. Alright, the first part I'm going to walk through is just going to be the slides because I don't have my own computer here. I'm going to show you all 17 cars, just this sort of the output of the semantic analysis that sliced and diced and then labeled and said these all look the same. And I'm going to prove to you that I didn't tweak it, I didn't tune it to any one car. I just gave it certain settings and then I let it go and every single car got something out. The second demo I'm going to open it up to questions or let people look at it. Um, but honestly it's just like output saying like this is how long things are taking. And then the third demo I'll hang out, um, either outside the room or over on the side here with my personal computer I've got it right here with this code loaded up with some data samples. If you just want to talk about it or you want to see it like actually being run in real time, I'm more than happy to do that for you. Okay. So what you're seeing, you're going to see two like groups of plots, one each group is going to be a car. And again the, it's licensed, it did the semantic analysis piece saying these all look the same and then it put them all in one grouping for me. I didn't do any work and it said here you go, enjoy. Okay, so on the left is going to be a vehicle and the right is going to be another vehicle. And if it says crop to fit that just means it found a whole bunch of these and I'm trying to make it so it's not micro print for you all. So I, I cut off a couple of the signals that were on there so that way we can sort of see it. Um, so on the right hand side, um, just so you know, every single sample here I collected in the same driving scenario. So I had people in a parking lot and then we, uh, we drove down a small little hill and went for about a three and a half mile drive and kind of stop lights and a lot of right turns and things like that. And then we came back to the parking lot and stopped. So you'll see it kind of goes up and then probably comes to a stop and goes up, probably comes to a stop. And then a lot of times like, so for example this one on the left here, vehicle 3, if you've ever worked with um, can networks, controller area networks in cars, a lot of times the OEMs like to put a counter to just sort of keep track and sort of like error checking. Um, in this case it grabbed a bunch of counters from vehicle 3. Um, probably vehicle speed from vehicle number 4 here. I'll just go through these. On the bottom here, on the github, I do have one data sample again from my car, my personal car that you can just start jamming right away, just download it from github, clone the repo, um, load up that data file and start hacking. Again, you can see like the pattern sort of looks very similar again, I had everyone sort of drive very similar. On vehicle 11 it did have a little bit of trouble, um, I think it was off by one bit on that second, second signal. Um, actually, yep, this is going around the hill, going around the hill. Alright, 17 cars now, that was sort of just proof number one, proof number two, I'm going to load up this video and then, uh, if anyone's got a question, um, I think it seems like a lot of people are real shy, so you can just come up here and ask me if you don't want to talk in front of the group, okay? And then after that, I'm more than happy to show people on my computer. Thank you very much.