 Okay, I'm from China. The title of my presentation is Hacking Windows C. It's about the security problems of the PDAs and the mobiles. My presentation is constructed with eight parts. First of all, I'll make a brief introduction of Windows C because unprocessed is the most useful processor in PDAs and the mobiles, so I'll introduce the unprocessed architecture. You know, understanding memory management is very important for bubble flow. The second are introduced memory management with graphers. Compared with other Windows system, Windows C choose process in a different way. And then I'll introduce the features of the processes and the threads of Windows C. The Windows API search technology is funny. I'll introduce it to how to write a share code use this method. And I will also introduce the implementation of a system call and use it to write a share code. After this, I'll demonstrate a simple bubble flow. Well, I think this technology will be improved in the future. Then I'll talk something about decoding share code. The decoding share code will make the real share code more universal, but unprocessed is the catch architecture makes it funny. And the last, I'll conclude my presentation. Okay, let's move on to the first part. Windows C is a very popular embedded operating system for PDAs and mobiles. The name is developed by Microsoft because of the familiar APIs, Windows developers can easily develop applications for Windows C. Maybe this is a very important reason that makes Windows C popular. Windows C file is the latest version, but Windows C.net is the most useful version and my presentation is based on this version. Maybe for a marketing reason, there is Windows Mobile Software for Pocket PC and the smartphone in Microsoft's website and MSGN, but they all based on the core of Windows C. By default, Windows C is in little indian mode and it supports several processes. Almost all of the embedded devices will use ARM CPU. ARM processor is a typical RISC processor which implement a load store architecture. Only load and store instructions can access memory. The data operating instructions operate on rest of the contents only. There are six major versions of ARM architecture. The demonstrated device has the same sound, F3C2410 processor which has ARM9 core and the architecture version is ARMV14. More and more PDEs are using Intel X-Scale processor, which architecture version is V-File TE. ARM processor supports seven processor modes depending on the architecture version. These modes are user, FIQ, IRQ, Supervisor, Undefined, Abort and System. The system mode requires ARM architecture V4 and above. All modes except user mode are referred to privileged mode. Applications usually running user mode, but on the pocket PC, all applications appear running in kernel mode. We'll talk it later. Windows ARM processor has 39 registers. These registers are arranged in partially overlapping banks. There's different registers of bank for each processor mode. The bank of the registers give a rapid context switching for dealing with processor exception and privileged operations. In ARM architecture V3 and above, there are 30 general purpose registers, the program counter register, the current program status register and the file saved program status register. 15 general purpose registers are visible at any one time depending on the processor mode. By convention, R13 is used as a stack point in ARM assembly language. The C and C++ compilers always use R13 as a stack point. In user mode and the system mode, R14 is used as a link register which stores the return address while a subrouting call is made. The program counter rest is accessed as R15. It can access the directory by data processing instructions. And this feature is different from other processors and it is useful for code location while writing show code. You know, understanding memory management is very important for bubble flow. But compared with other Windows systems, Windows C choose memory management in different way. Windows C uses ROM and RAM. The ROM in the Windows C system is just like a small read and hard disk. It stores the entire operating system as well as the applications that are bounded in the system. The data can maintain inside the ROM without power or battery. The RAM in a Windows C system is divided into two areas, the program memory and the object store. The object store can be considered something like a permanent virtual RAM disk and it's unlike a RAM disk in the PC. It can maintain the files in it even if the system turned off. That is the reason that, this is the reason that Windows C devices typically have main battery and backup battery. They provide the power to the RAM to maintain the files stored in it. Even when the user hit the reset button, the Windows C kernel starts up looking for previously created object store. The system will use this data store if it finds one. Another area of a Windows C RAM is the program memory. The program memory. Program memory is just like the RAM in the PC. It stores the heapers and the stacks that the applications, for applications that are running. The dividing over the, between the object store and the program memory is adjustable. The user can move the dividing line by the system control panel applet. Windows C is a 32-bit operating system so it supports 4GB virtual address space. Let's look at this graph. This is the 4GB virtual address space layout. The Apple 2GB is used by, is a kernel space used by a system for its own data. And the lower 2GB is a user space. From this address to below, this address memories are used to, large memory allocation such as, it's used by them such as memory mapped files. The object store is loaded here. And from zero to below, this memories are divided into 33 slots, each of which is a 32 MB. The slot one is XIP, execute in place. This is a new feature of the Windows C.NET. It expands the applications of virtual address space from 32 MB to 64 MB. Well, this graph is the slot zero virtual address space layout. Slot zero is very important. It's used by the current active process. The first 64KB is reserved by the OS. And the following is the processes code and data. Stacks and heaps are arranged here. DL files are loaded into the top address. Well, let's move on to the next part. Compared with other system, Windows C choose processes in a different way. Windows C limits 32 processes being run at any one time. When the system starts, at least the four processes are created. Let's look at the previous graph. NK EXE, which provides the kernel service, it's always loaded into the slot 97. FileSys.exe, which provides the file system service, it's always in slot two. Device EXE, which load and maintains the device, drivers for system, it's in slot three normally. And GW ES EXE, which provides GUI support. The other processes are also created such as expire. The threads on the Windows C is similar to threads on the other Windows system. Each process at least has a primary thread associated it up on start. You will never created one. One process can create any number of threads only limited by the available memory. Each thread belongs to a particular process and shares the same memory. But the setprog permissions API can give the current thread access any process. Each thread has an ID, a private stack, and a set of registers. When the process is loaded, the system will assign the next available slot to it and the file is loaded into the top address of the address space. And then followed by the stack and default process heap. After this, then execute it. When the process thread is scheduled, the system will copy from its slot to slot zero. It's not a real copy operation, it's just mapped into the slot zero. This is mapped back to the original slot allocated to the process if the process becomes inactive. Process allocates a stack for each thread. The default size is 64KB depending on the link parameter when the program is compiled. This size can be modified. The top 2KB is used to cut against the stack overflow. We can't destroy this memory otherwise the device will freeze. And the remainder available for use. Variables declared inside the function are allocated in the stack. Thread stack memory is reclaimed when the terminates. Well, let's move on to the next part. Before we explore it, we must have a share code to run on the Windows E. Windows E implements the VIN32 compatibility. The code URL provides the end points for almost all the APIs that implement that in the code URL. We must search the necessary API addresses and then use these APIs to implement our share code. The traditional method to implement the share code in the other Windows system is to locate the kernel 32 or NTDRO via PEB structure and then search the necessary APIs via PEB head structure. Well, is there any structure like PEB on the Windows E? The answer is yes. That's the kernel, the K-data structure will satisfy us. The defined in NKarm.h. K-data structure is a very important kernel structure can be accessed from user mode by fixed address K user, P user K-data. The value over P user K-data is defined in the K-fungs. It's in the normal SDK. The value, the P user K-data is FFFC8-0-0 in ARM processors and file 8-0-0 on the other CPUs. Well, let's look at the last member of K-data structure. It's offset 3-0-0 from the start of K-data structure. There's the, the 8-in-4 is the D-word array. And there's a point to the module list index nine. So, offset 3-2-4 from the FFFC8-0-0 is a point to the module list. Well, let's look at the module structure. It's defined in kernel header file. The second member is P mode. It's a point to the next module of the module chain. This is, the third member is LPSA mode name. It's a module name string. So, we can locate the code URL by comparing the unique code string of its name. Well, we also notice the E32 member. Let's look at this file. E32 member is a E32 litre structure. It's defined in PHDR. Well, let's look at, look into the E32 litre structure. The E32 V-base will tell us the virtual base structure address of the module. So, we can locate the base, virtual base address of the code URL. We also notice the last member is E32 unit. It's an info structure. And the info structure has two members. The first is a virtual relative address and the second is a size. And the litre structure is defined in the PHDR, the start of PHDR. Six members are used by NK. And the first is the export table position. So, from now on, we got the virtual base structure address of the code URL and it's export table, the relative address of it, it's of the export table position. I wrote a small program to list all the virtual base address of the, of all loaded the DL files. This is the K-data structure and offset, offset 324 is a point to the module structure. And the module structure is a change structure. And the module structure is a change structure. The offset 4 will point to the next module. Well, the point to the module structure is in the kernel space. So, this program only execute in the kernel mode, the user mode, it will fail in the user mode. Well, the most over-parked PC runs were built with the full kernel mode option. That's the applications appeared all running kernel mode. So, this small program will execute in the most of the pocket PC. I use this program to call the virtual base address of the code URL is 1-F-6-0-0-0. But when I use EVC debug to look into, this address is invalid. The valid data is from 1-F-6-1-0-0-0. I think maybe the window C is just for the purpose to focus over the memory and the start time and then it didn't load the head of the DL file. Okay, the VINC4.duster uses ordinals to find the API address. But I think use the API name to find the API address is more reliable for shell code. Because the API name won't change even if the code URL file is changed. Because we have the export table position and the base address of the code URL. So, we can use the... We can compare the image export directly to find the API address via API name. To save size, we can use the hash technology that from the LSDs VIN32 assembly components. This graph is ripped from LSDs VIN32 assembly components. It shows the way to locate the base address via API name. Well, let's move on to the next part. The test ASM is our final shell code. It constructs with three parts. First, it will get the export section to get the virtual base address of the code URL and its relative virtual address of the export table position. And then, it uses the find function to find the API address via the hash value of the API name. And then, it stores the API address to its own hash value position. The last part is our implementation. It's the implementation of our shell code. It will change the registered key of the Bluetooth and then use kernel IO control to soft restart it. This shell code will open the Bluetooth over some IPACs. And we must notice the... LDR, pseudo-instruction. Look here, why I comment on these instructions. Because this pseudo-instruction will be this in the EVC debug. It depends on the program. So it won't get the FFFC800 in the shell code. So this instruction will make the shell code fail. And the next instruction will be this in the EVC debug. It's okay. Well, in Windows C, R3 and R0 and R3 used the first to fourth parameters of the API. If the parameter is larger than four, the other parameters will stored in the stack. So we must pay attention while writing shell code because the shell code always in the stack memories. Well, EVC debug has some bugs that make debug difficult. The first, EVC will change the stack contents when the stack is reclaimed in the end of the function. Well, it's difficult to debug the bug or flow because the shell code may be changed. And the instruction or the break point may be changed to E600010 in EVC sometimes where it's very bothering. And EVC allows the code to modify the text segment without error while using break point or when you step-by-step to debug into it. But sometimes it's useful. Well, the adpro for the ARM debug also have these bugs. So I think maybe this is the bug or the debug API over Windows CE. Maybe. Well, let's move on to the next part. The above shell code we talked is complicated and it can't run in user mode. Is there any other method to implement shell code? Well, let's look at an API implemented in the code URL. Well, let's look at the power off system. It has no parameters. The first, the system will check the K thread info structure. And K thread info is initialized in the MD created thread 2 function in the MD ARM.C. If the application running kernel mode this value will, the K thread info will be set with one. Otherwise, this value is saved zero. So it test this memory is one because the K thread info is one then the system will use the API set table to find the real implementation in the kernel space. If your application is running kernel mode and this instruction will execute the R1 is started from the 8 blah, blah, blah it's in the kernel space. Otherwise, if the K thread info is zero the system will use this. This is the system call over windows C. The system call number is from F 000 to F 00100. There is a formula to determine the API, determine the system call number. It's like this. So, using this formula we can calculate the system call number. Well, let's calculate the kernel I O control. Kernel I O control is defined in the K VIN 32.C. And the API set is zero and the index is 99. So, we calculated the system call number is like this. F 0000, FE 74. Well, the system call number is more stable than the API address that implemented in the code year. Because the variable can put the wrong, the API address is variable. So, this show code is very simple. It only uses the system call to... So, this show code will can execute in the user mode, but also the kernel mode. So, I think this show code will used in the smartphone. The smartphone, the smartphones run not build with the full kernel mode option. And Microsoft device emulators run has no this option too. I tested this show code success in the Microsoft device emulator and my demonstrated device. Okay, let's move on to the next part. The hello to CPP is my demonstrate a vulnerable program. It reads the... It reads the bing file of the root directory by fread. And then read the file to the stack variable buff. And the buff is only 512 bytes. So, if the bing file is larger than 512 bytes, the program will overflow. And the printf and the gated chart is just for test. They have no effect without the console DAO in Windows directory. The console DAO can obtained from Windows mobile developer powered toy or from the pocket console. Windows uses BL to call a function. And in the hello function, this is the first instruction. Let's store LOR to SP. It will store the LOR raised to the stack point... to the stack which contains the hello callers address. And under the L-O-D-M-I-R-SP to PC is the latest instruction in the hello function. This instruction will load the LOR raised that stored in the stack to the PC register. And then the program will flow to the WinMain function. Then it will return. So, overflow the LOR stored in the stack we will obtain the control while the function returned. The variable memory address that allocated by the program is corresponding to the loaded slot, both the stack and the heap. The process may be loaded into the different slots each start time. So, the base address of the slot always alters. But we know that the slot 0 is mapped from the current process slot. So, its stack address is stable. Anyway, you can use an jumper address, jumper instruction address to let the program flow to your shell code. Well, I tried two methods to construct the exploit buff. The first is like this. The start is the pad instructions and then is the return address that we overflow it. And this return address is pointed to our shell code. Well, this exploit buff is larger. The PD is frozen when the hello program is executed. I think maybe there are two reasons. The first is that the stack or window C is small and maybe the buffer flow destroys the 2KB guard on the top of the stack boundary. So, I changed to another method to construct the exploit buff. It's like this. I move the shell code in the start position. And then the last is the return address that pointed to our shell code. Well, I used this method as a success. I'll demonstrate it. Okay, the first, the Bluetooth is closed and I'll execute hello. It will print the address of the buff and the size. When I press any key, the device is restarting. Well, it has restarted. Well, after the system restart, the Bluetooth is opened. Well, I'll close it because it's dangerous. Okay, let's talk something about the decoding shell code. The buff shell code that we talked is a concept shell code. It contains a lot of zero. Well, the other vulnerable programs may be filled with special characters before the stream buffer will flow in some situations. For example, overflowed by STR copy. It's difficult and inconvenient to write a shell code without special characters because if you use the R0, the OP code will contain this zero. So we think about the decoding shell code. The newer ARM processor has a Harvard architecture which separates instruction cache and data cache. This feature will improve the performance of the ARM processor, but the self-modifying code is not easy to implement because it will affect the ARM cache architecture and the processor's implementation. Well, let's look at this shell code. Well, this is the decoding shell code for the test. The last four instructions move one to R1 to R4 registers. And the first four instructions will modify the last instructions. It will make the value to 99 because the R1 is 99. So the R1 and R4 registers will get the 99 in my demonstrated device. Well, you can see the last four instructions hasn't been modified. But it will need more novel instructions between the decoded shell code and the real shell code in the Intel XScale processors. You can try it. Maybe only the R1 gets the 99. I think maybe the ARMv5 architecture has six pipelines, and my demonstrated device only has ARMv4. It has five pipelines. Maybe the pipelines will affect the result. Well, let's look at another method. Well, in this method the last four move instructions were encoded with the X-clusive OR with the 8.8. And the start is the decoding shell code. It has a loop to load the encoded shell code, to load one byte of encoded shell code. And then decoded, then started to the original position. Well, the R1 to R4 won't get the one. Even if you add a lot of the loop instructions, well, we saw the last four instructions is decoded. But they don't get the one. So I think maybe the load instructions will catch some features over the catch architecture. So we must need to flush the eyecatch or decatch after the shell code is decoded. Well, the first I think about the system interrupt. I use the system interrupt the success in the X power PC and the thrice spark. But the SWI instructions in the window C is just nonsense. The SWI handle defined in the ARM trap is just to move the LR rest to the PC. So it's just a knob instruction. Well, I have to try another method. Because the pocket PC ROMs were built with the full kernel mode option, the applications running the kernel mode. So I think about the MCR instructions to flush the eyecatch and the decatch. The MCR instruction can access in the kernel mode only. But unfortunately, this method all failed. Okay, let's move on to the last part. The code we talked about are the real life buffer flow example in window C. It's simple, but I think this take note will be improved in the future. Because the instruction catch and the decoding shell code is not good enough. Well, the internet and the hand set devices are growing quickly. The threats to PDs and mobiles becomes more and more serious. The patch over window C is more difficult and dangerous because the entire operating system is contained in the ROM. And if you want to patch the system flow completely, you must flush the system. You must flush the ROM. And it is dangerous and difficult for the normal customers. Well, I consulted a lot of people on the internet. It's all listed here. I'll show my special appreciation to Nestor who helped me a lot. Well, that's the end of my presentation. Hope you enjoyed it. And sorry my lame English. Thank you.