 Sorry for delaying. Okay, can I start? Anytime? Okay, sure. Hello everybody. Thanks for joining. My talk is AutoRF, a framework for automatic firmware reverse engineering. So I'm working as open source firmware developer at NineElements and we wrote this framework just for fun and not for profit. It's open source available on Github since yesterday. It's brilliant Golang and it allows to do black box testing of firmware. It does abstract zoom text tree generation and is then able to generate C code out of that. The whole thing is backed by MySQL database and I'm going to explain all the details in the next few slides. So what's bootfirmware? You all might know that's the most privileged code running on your machine. When you press the power button it does all the hardware initialization. For example that's core boot or UFI and it usually loads the bootloader from disk like grab and then the bootloader loads the operating system. And in this talk I'm going to focus on the firmware part only ignoring any later stages. If you look at core boot it's written in assembly and C code and divided into multiple stages and each stage loads the next stage and every stage has a specific hardware initialization and let me show you. If you press the power button and you see here overtime first of all is the boot block written in assembly which loads the ROM stage that does the DRUM in it and once the DRUM is available we can initialize other devices like PCI, PCI Express and so on. And after that the bootloader or in case of core boot it's called payload is loaded into memory and that loads operating system. And I only care for the first stages as shown here. So in a perfect world for example the core boot running on the syncpad X230 there are no blobs everything is open source it has basically been reverse engineered so some parts of it but that also means there's no register documentation on some parts that means you write magic numbers somewhere into the PCI space and then everything works which isn't that nice but at least it's open source and you can see what it's actually doing. There's microcode which is loaded as shown here the yellow bar it's loaded in an early boot block it's not directly code so I'm going to ignore it and claim it's 100% open source even though it loads microcodes and we have got another example here core boot on a super micro server X11 SSH it uses multiple blobs in this case it's the firmware support package fsp it occupies about 500KB of space in the spy flash and core boot itself so the open source part of the firmware only occupies 200KB and core boot jumps into the fsp and let the fsp do all the hardware installation as you can see here so one big part is the ROM stage is bringing up the DRAM and then in the ROM stage it does all the device initialization and sometimes coreboots need to undo what fsp does because some features cannot be disabled and then if to yeah and there's also microcode updates again it's loaded on the old boot block so but I'm only concentrate again on the blobs so what are blobs? Binary latch object there's no source code available for the blobs usually there's no documentation what it does and that's because of that it's difficult to integrate into open source firmware there might be security issues because you don't know which register is already locked or isn't locked and you should lock it in an open source firmware there's no link assembles that means you always have to use the whole blob even though you only need to single feature out of it there's no garbage collection and the size cannot be reduced also debugging is quite difficult usually there's no debug output that would make increase the size and I guess it would allow or give insights what the block actually does so what are we allowed to do? I'm from Germany and this is from law translated into English so I'm allowed to gain basic knowledge of the ideas and concepts by just loading, displaying, running the program and I don't need permission from the copyright owner that basically means I can do reverse engineering and even though then there's some sentence in the license that doesn't allow it the question is what is the implication of raising in the term of law? so the law isn't very precise okay there's another passage it talks about decompilation and it says it's only allowed for interoperability but it doesn't say what decompilation at all so it just mentioned this term and it also says if you decompile it you're not allowed to create a program that does basically the same so we cannot create a free and open source software out of it so the only thing we can legally do is black box testing we observe that's just shown here so black box testing we have the firmware that talks to the hardware and we can observe the I.O. and we can observe bias options like settings you usually do in the bias menu and if you observe both we can generate a model out of it and that model can be free and open source but there's some issues it only works on a single hardware it's very difficult to see branches inside the firmware and to catch corner cases there might be some fix up for specific devices and if we don't have that device we simply won't see that fix up when we observe the I.O. and our model is likely incomplete and there's lots of data we need to analyze and we cannot put the firmware in an emulator like QML because it doesn't emulate the hardware okay so what do we need to do we run it on real hardware and there are similar projects like CRIs and avatar too that put the firmware inside a patched QML but that's something we haven't done so we put the emulation inside the firmware and in this case it's a library called libx86mu and it allows to trace I.O. and send it over the CRI port and it also allows to upload BIOS option like change it to fake I.O. or skip I.O. and the whole thing is done in a client server model let me explain so this is an open source library you can find it on github it emulates x86 CPUs so AutoRef currently only works on x86 but in theory it runs on any architecture so the library allows to hook to specific instructions and in our case we only care for I.O. instructions it only does 32 bit it only does 32 bit yes that's right and what we do we never so we don't jump to ROM stage we continue running in ROM stage and instead load the library and the library emulates everything and so the ROM stage and the payload run in this emulated environment the stages doesn't even know that they're being emulated again here you can see the blobs what we are doing is we are tracing all the I.O. of the blobs in the ROM stage so this method works on any hardware there's a serial port with that we can observe memory access like read and write access to the PCI config space access to I.O. ports and machine specific registers and the CPU ID instruction and so that's the complete set of I.O. firmware talks to hardware and we can just observe it with this library you can see here it's not that readable but you get an impression what it does so in this case it's mostly PCI reads and the last two are memory accesses what we actually can do is convert it to something more readable like C code and you can compile it and run it on or replace the blob and put it into your firmware it actually won't work it only works on the single machine for the single configuration which is not what you likely want so what we can do is generate a syntax tree and that's the main feature of the framework it collects traces and put them into the database and then merges all the traces into a syntax tree as you can see it's a directed graph and if you have two runs on the first run the bias option one is set to true on the second it's set to false and you can see there's a slightly difference in the I.O. operations and we can then merge it into a graph and actually see which bias option triggers which pass and the merging is done with the LCS algorithm and it tries to generate a minimal graph that has some issues as it turned out and we then can convert the abstract syntax tree to a high level language again we only implemented this for C but in theory we could just generate anything like Go or Ada or whatever analyzing the mesh is quite time consuming right now we only tested this with QMO and a small example and it's in this case it's okay but on real hardware it's pretty slow you can see here that's the the abstract syntax tree converted to C code and this is another example to the framework ships a simple QMO code boot image that runs in QML as you can see that's the graph generated every node is a single I.O. instruction and if you generate C code out of it it doesn't look that pretty it's not even complete it just continues below the slide so that's only and that's something we need to work on and make pretty C code there are quite a few to do we would like to add plug-in support detect loops inside the firmware because right now we can't detect read modify write operations usually only one bit is set or cleared we can't detect that right now there's no dead code detection so if we run into a code pass we shouldn't and it just crashes we won't see that we have no sec-fold reboot detection we need to work on making pretty C code and optimize the abstract syntax tree even more then this usually the question can we reverse engineer like the complete FSP and have some stats so I assume you have four CPU sockets with four CPU each and four DIMMs for memory and do test for different DIMMs two piece cells bridges 16 USB ports quite a few piece express lanes which can be again equipped with different devices up to eight serial atta ports and FSP has quite a lot of options that can be configured and I only assume they can be true or false then we have two to the power of 588 times 15 minutes to collect a single trace and that's going to take quite a while and then after you collect all the traces you still have to analyze the C code so black box testing might give insights into the firmware into small parts of the firmware but it's likely not the correct approach to reverse engineer a complete hardware installation like the FSP and that's it already I think it was quite fast do you have any questions so far? yes so the question was if I recall timings right now I don't record timings and as you have seen we only trace IO in the RAM stage so DRAM training isn't or it isn't possible yet to capture DRAM training only everything that lives in the RAM stage so as we send everything over the serial the comport it takes a while and we might run into some timing issues that bits are already set because we wrote out a single line over the serial so yeah we could work on that maybe use a different approach of sending the traces did you observe any behaviors that were obviously in bugs so I only recorded one trace sorry so the question was if I observed any bugs in blobs so I only run a single trace on a FSP and as that's why I said it takes 50 minutes and I didn't have that much time to collect more traces I continued working on the QMO demonstration because it takes about 5 seconds to capture a complete trace but that's something we could do just let it run overnight and then analyze the C code do you explain emulator for running DFI option logs and there when you run x86 for option log code in an emulator you run it in lots of like no point of view references and other things that you don't see when you run the data so you run the emulator so that's why we are we are running blobs that are usually compiled into protected mode so we won't see any issues with running option runs that are usually compiled into 16-bit real mode I doubt that there will be lots of issues with it because we can see that those blobs are working quite well on modern computers and yeah yes please I know that on the other hand the definition of a trace is a metaphor interpretation if you write an emulator it's a trace for every instruction it carries out I would still consider this a trace others would consider it to be a debugging exercise but in this regard the way we formulate the law they work into your papers because I see it as a single instruction you just have branches that execute differently you don't know what's happening in between you don't have great timing even an idea how long it takes before two pieces of memory are being read or written so finding out the heuristics just to spot blobs will already be quite a bit of a challenge there may be shortcuts there I would suggest to consider some of that yeah I think it was the note the note was about the German law it turns out that we are likely on the safe side of the law because we really only capture the traces and don't use it as debugger to trace every single instruction so it could be quite challenging to detect branches and type loops on the target because everything is slowed down and we only really see I.O. and not the decision to send out the specific I.O. instruction any other questions okay thank you for attending