 All right, thanks everyone for being here, for waiting, and also very big thanks for being with us in this large scale experiment. NordTec has always been a large scale experiment, and we've always been doing things that we don't know how to do, but still we do it. So let me introduce you to our next speaker. She's called Bex. Bex enjoys tinkering with systems in undocumented manners to find hidden sources of computation, like a lot of us. She has previously studied the weird machines present in application linkers and loaders, publishing some nifty POCs along the way, but has since turned her focus toward the kind of loaders that bootstrap systems. Bex is currently a senior security researcher at NARF Industries. So the title of her talk is going to be regions and types, types are policy and other remnants. I'm sorry, I mis-said it. Regions are types, types are policy and other remnants. Thank you, Bex. Thank you. I will just get this started. Hi. So I'm Bex, and hopefully you'll be seeing my slides soon. Okay, thanks for coming to my talk. So as I said, I'm a senior security researcher at NARF. I previously started with Sergei Berettis at the Dartmouth Trust Lab, and this work that I'm presenting is mostly what I've done with Sergei at Dartmouth. I've continued some of this work since then, but that has not made it into this talk. And in case you hear barking in the background, that would be this little beast I have right here named Roomba. Yeah, doesn't she kind of look, can you see the resemblance? Yes. So I started this work searching for this elusive route of trust. I tore apart hardware, I searched software, and I just really wanted to better understand these internals that are really normally just trusted and these layers of abstraction that we build our systems upon. Really where I got to was bootloaders and loaders and I actually consider them to be really the same software all under just a loading capability of trying to set up the address space and set up your system for something, a new binary to run. Although there have been significantly more sightings of the TPM chip, the trusted platform module, the Noloknes monster, its presence doesn't automatically signify safety even when it's used to its fullest extent because software can still be flawed, even if it's software that it signaled as good because software is not easy. So even simple software, simple, I'm using with scare quotes and probably can't see anymore. Simple software like bootloaders suffer time and time again from bugs, memory, corruption, logic, pause, et cetera. When in this classical view of trustworthiness and what we are taught to look for is confidentiality, integrity and availability, but the hard things to measure are really these other features that we need are the software behaves as expected, there's no unexpected behaviors, software is predictable and reliable. And I just as a little story, I guess, not that I have the time, when I was a kid, my parents decided to make use of the V chip in the TV which was supposed to block you from watching certain content or channels if you didn't have the pin code. And of course I wanted to watch my cartoon. So a surprise when it worked on the first try. Later that week, I knew I typed it wrong and it still works. So yes, it behaved as expected, but it was unexpected that it would unlock regardless of what four digits I typed in there. So, okay, rambling. I'm interested really in policy granularity. Not all software policy is the same which is a silly statement to make, but some behaviors and some components of your software are hard to describe with like a single flat piece, bits of tags and labeling and policy. Types in programming languages are really do help us craft software that is more reliable, that we understand without types. I don't know, all chaos with break loose. And so types is a great place to start. They're a great piece. They are at a very nice granularity for writing policy because they are a policy. But sometimes we have to look outside of just these types to enforce behaviors that are just harder to express within the programming language because memory regions are also types. There's also, they can be objects. Sometimes they are treated exactly as objects. Sometimes though, they just are holding pieces of data that are related in a certain way and should be treated in the same way and not allowed to be accessed by something else. So these types that are memory regions, they're also, they're a policy. They're describing a policy. And what I'm gonna get into is a memory region-based policy. I'll be talking about my thesis work. It was called ARBWAC, which I'll expand on later. But there's also work done at Dartmouth called ELFBAC, ELF, Binary Access Control, which was very much a memory region-based type policy for an intra-process policy instead of between processes and having the address space act as a boundary. It's within a single address space of a single process running. You can have defined policies between regions of memory depending on the execution. And that's very much what I'll be getting into here. The granularity is pretty important, but first just what do I mean when I talk about types? Most people use types in a very programming language centric way. Types in programming languages are assigned to constructs that be a function parameter, return value, data object, and these types are there to enforce relationships between constructs. I'm interested in this more broad view of types because if it quacks like a type, let's call it a type, although that might be a little confusing, but let's just broaden what we think of as types. Because types are there to define what operations are allowed on an object. They define semantics, they define behaviors. What about objects that exist outside of programming languages? So as I mentioned before, bootloaders really, sometimes they use a memory object, memory region as an object. It's something they're operating on. It's something they're crafting. It's like this first class thing that is important to it. And I think there's something larger picture that is hard to actually really capture within the source code of the loader and the bootloader that we want to be able to see, define, and actually write policies of. And that is really part of the regions around it, the regions of memory that are there and how they're treated. So in programming languages, here I'm looking at interpreted languages, we might not realize that there's like a type on the strings and the integers that we're using, but you'll see perhaps at runtime, perhaps before it really runs, if you do something that it's not supported by how the types are defined, you might find errors. So if you try to concatenate a string with an integer in Python and Ruby, you'll get some complaints. They complain in different ways, but that's just kind of part of how their type policy is defined and implemented in the language. JavaScript is its own weird beast, so sure you can actually concatenate a string, an integer to a string. If you start writing it up a little differently and put two concatenation signs, it still works. If you reverse it and concatenate a string to an integer, you get not a number, and then you can find other interesting things like true plus true, equaling two, and so forth, and these are lots of fun little things that were discussed in a talk called WTFJS, and I have a link to that at the end, so that's a fun talk to watch. In compiled languages, you'll see types used as a way to also enforce these relationships between code and data and see if you say, try to concatenate or add a integer to a string. When you build your program, you'll see complaints. You'll see maybe less verbose complaints in Java. Types in Rust are pretty neat. Not only do they do these sort of semantics relationship of code and data, but there are also types there that ensure concurrency and memory safety, which is not necessarily something you get out of C. There's other compiler-related types that may not be actually discussed as types, but can be thought of as types, and compiler-injected policies. So dependent types are an interesting piece of work, research where the type of an object is actually dependent on its value, so you may not actually know or be able to run all these enforcements before the program is run, but there's a lot of interesting research there, so one example of dependent type on the left is just, you might say that the type of this object is lists of length two, and so if I try to assign it a list of length one, it won't compile. Type state is something a little pretty different and it's actually fairly related to this work. It is an instance of an object. What operations can be applied to it depend on its current state? So say you have some sort of object that you can read and write to, but only after it's been allocated or after some allocates been called upon it. So this is sort of these enforcements are done based on these runtime states, and you sort of statically define these allowed operations depending on that. Control flow integrity, compiler-injected, there is the compiler decides what function calls like where to cook and jump to at certain function call or call sites and enforces this at runtime based on the tables it comes up with and secondaries are sort of another type of compiler related types where it builds into your program, a check that there is no stack corruption, but we can say this a little differently that a type is in this little piece of in theta on the test stack is not written to or overridden in a unexpected way. And so it's sort of enforcing this read-only bit on this piece of the stack that actually ensures that or at least helps detect stack corruption. So outside of programming languages, there's other behaviors that have this kind of tight feel to it. So file permissions, for example, they get labels and the labels of the process of the files and the owners of the processes. And there is logic there that determines whether a process with this owner can access a file with this given set of tags. SCLinux is a much more fine grain than file permissions. If you've not actually dug into the details, you're kind of lucky, it's nasty, but also it's very integral part of some, I mean, it's very heavily depended on, still there. Okay, I can't say too much about SCLinux just because I haven't checked to see how it's being used these days, but it is very much a bag of types and permissions and very fine grain whether or not if you have this domain tag as a process, can you open a socket of this type or can you run this system call and read, write, execute page permissions. In other sense, it's also types and policy right there deciding on whether or not this program can execute this address on this page based on the page permission. So moving from types to policy, how is writing policy like training a puppy? Well, it helps when the objects or policy objects or your stick or whatever is well sized, I guess treats for the puppy. So in inter process policies, I was mentioning Unix file permissions, they're great. Sometimes they can be a little too coarse depending on your attack model, depending on many things. So SCLinux was actually developed to help fill in the gaps that were Unix file permissions couldn't really cover. And it's kind of hard to come up with something just right. I think there's definitely other things other than just file permissions in SCLinux that are due some interesting inter process policies. Intra process policies as we've seen is the Read, Write, Execute, Page permissions. Those can be fairly coarse. Any page that's readable is readable all times or unless changed somehow. But we don't really have necessarily rules of when to change pages to read that are enforced outside the context of a process. The C type system is kind of this inter process thing but it's fairly weak considering. But a region-based typing, it can be very flexible. It doesn't have to be a page or larger, it could be smaller. Thinking about regions as types might give us a nice granularity. And the just right granularity, we want something that describes what matters naturally. So a wish list kind of a policy is we want to be able to describe what matters. Something that's relevant, something that's concise. Describe only what matters. And well, okay, what does matter. So again, looking at boot loaders, we'll explore what matters to a certain degree in boot loaders. And the thing is you can get as specific as you want but maybe it doesn't matter that this particular address is readable at this time. What matters, however, is that we can understand the policy we write and maybe be able to make it as coarsely grained or as finely grained as what time allows and what our knowledge allows for. So it's just something that we can kind of iterate on as we better understand software. So if it looks like a duck, it quacks like a duck, but it needs batteries, you probably have the wrong abstraction. So really it helps to have the right policy objects for what we're trying to do. And what's right and what we're going for is sort of this intense level semantics. Here's a sentence, the koschak distins the doshes. We may not understand these words but the grammar reveals relationships. So while looking at U-boot, it's kind of a monster that's a pretty major, large base of code if you've ever pulled it. But let's see what we can do. Semantics of U-boot can be derived via these code by understanding code and data relationships. And the code data relationships may depend on how the bootloader is executing and its current phase and the current phase's goals. So just as a quick aside was a bootloader, it's software that transduces static on disk, for example, binary images, images containing machine, actual machine code into memory for execution. And the binary image itself, well, there's plenty of examples of that LPE Mako if you've heard of them. Some of them are just very much just machine code and very minimal metadata on top, perhaps a length field, an entry point field, an address of where it should be loaded. So there's a lot of different binary images. And the loader really just extracts that from disk, understands it, copies into RAM, and jumps into it. Bootloader or regular loader, they're kind of the same in my mind. So a useful terminology for the rest of this. Address space is just this general term I use to refer to addressable memory. And in the bootloader I'm looking at, it's actually physical memory. So really, technically you can address anything. You might see some actual faults if you, depending on the context of what you address, but there's a lot of errors. There's a lot you can address in the memory without actually causing errors, even if it's not correct. Memory map is this bottle of address space that semantically labels these regions. So what do we expect of our bootloaders? They discover, initialize resources and hardware. They prepare address space. Maybe they prepare a stack because the stack initially might not actually exist or it might be in a region of memory that is very cramped. It might move its own image, self-relocate to another place in memory as it's executing. It will load a binary image, hopefully, if it's going to continue on with the loading process and prepare it for execution. And finally, it'll jump into it. So how should U-boot behave? And more specifically, I was looking at in build of U-boot for the Beaglebone XM and the SPL level of boot. So for this particular device, there's a series of bootloaders that execute. First there is the on-ship and then it loads the SPL, which is a small image U-boot. And then this U-boot, small U-boot image initializes more hardware and eventually loads something larger, which I call U-boot main, but we're gonna focus on the smaller one. And the questions here are, does this SPL, do we know that it behaves as expected, never behaves in an unexpected manner? Does it act predictably and does it act reliably? So don't worry about this little image on the top by trying to give a sense of what successful execution looks like. The little arrows just contain the function names of what I consider different phases of boots. And then I have labels of kind of how the global state has changed or different or what's important about these phases above them. So at certain points, the stack has moved several times until the final one exists and is done with moving at. It has to initialize some external RAM. And then it loads an image. So let's just look at what happens when it finds its target image and it starts loading it. So in this little piece of code here, there's a whole fixed address at some location config Texas space where it copies part of a file from the SD card. And it parses that with SPL parse image, this little header that was copied into memory. And what it can't really see here, but parse file header will actually set this value of SPL image that load at her. And so what's happening here is that the image on disk itself is telling the boot loader where the image live in memory. It makes sense because binary images might be built to be run only in a single, they might not be relocatable. They might only run if they're copied into a certain address in memory. It could be a big problem for the image. However, the boot loader itself knows more about its address space than the image that the load adder may not be memory backed. It might actually be memory mapped IO, memory mapped registers. So yeah, here you will just blindly follow this address the target says it should be loaded. Why not read it into MMIO? Kind of undefined what will happen, but I bet it'll, you'll read, the hardware will reset itself and I've tried doing it. Will be sounding emulator and yeah, you don't really get a successful boot from that. So here's an opportunity here. However, this is, we have a sense of regions where you boot should not be writing stuff too at this point in time. Why can't we enforce that? I mean, these different memory regions as defined by the hardware reference have different semantics. And so of course don't read this, but there's a region of memory where there's boot ROM, on-chip boot ROM. There's a region memory where there's on-chip boot RAM. There's regions of memory that are assigned to memory mapped registers. We know this statically. So why can't we enforce these regions semantics? Although it's not that static. So there is sort of this overall staticness of how this physical memory is laid out, but there's pieces that change over time that are just as useful, just as important. So for example, where the stack is or where the subsequent stage image is loaded, we don't wanna start writing arbitrary stuff to that after we loaded the image because once you execute the loaded image, you might run into garbage. So let's use this as an opportunity to add some enforcements. And the idea is to divide execution to substages based on this memory usage and the goal of the substage. And this is where I finally introduce our black, address region or really region-based write access control. This can be generalized to memory reads as well and work related work, Elfback project does this. But here I focused on writes because the bootloaders and it was a very nice thing to focus on. So the components of our black policy or the substages, the distinct phases of execution and memory regions, regions of memory, each of which have a particular use during a substage. So this is brought to you by a tool suite that I wrote a while ago, so it's in Python too. But I'm continuing some of this work. So you might break out of that Python to Elf or at least add parentheses to my print statements to try to get it to Python three. But this particular tool suite does static analysis, it also works with some statically generated data to figure out regions and instrumentation helps us figure out regions. And then we can start writing policies and enforcing policies all with that. So our bit, getting this applied to a bootloader. I found with the bootloader, there is three sort of phases that were interesting from a policy perspective. There's loading phases, loading or preparing regions of memory for subsequent stages or patching, which is very similar and bookkeeping, which is just a bootloader specific stuff. And I also included hardware initialization in bookkeeping. Regions of memory were classified by attendant use. So code, which should not be writable. Memory mapped registers, which should sometimes be writable. Bookkeeping data and state regions with the future substage image, such as the binary that's being loaded. So how do we describe and enforce this? I first wanted to actually identify these phases. So my hypothesis is I could identify loading and relocation for sort of like instrumenting all the rights and searching for something that looked like a mem copy, which is like a copying a sequence of adjacent bytes in a tight loop. And I call these block write operations, which is defined by this tuple of program counter, offset an image, destination where it's written, size of the right end call stack. So with my instrumentation, I sort of built up a table and just like sorted by size of block writes and found regions of relocation and loading that I didn't actually know existed. And I won't actually go into specifically, but out of about 400,000 write operations in this bootloader, about 10,000 were block writes. A lot of them were just writing to the stack, but we saw where the region where static data, which should be zeroed out is actually zeroed. And so that's sort of a setup phase for another sub stage and you don't want to be touching too much other than that region while you're trying to zero it out. And then other phases where a target image is loaded, I found some relocation. And so just to give you a sense of the call graph and how I ended up dividing these phases of execution was based on entry points to certain functions. The ones that are colored yellow were sort of an entry point to a new phase. And then there's a success and failure depending off it hangs or if it jumps into the loaded image. Also, my instrumentation traced the call stack as it executed. So sort of an indentation in there means a call function call. And if it returns, then we'll see the indentation code back. And so this is sort of just like a collapsed view of some of the calls that the through loader made and returns if it returned. This is a tricky to get to next slide. And I overlaid the call stack information with a block write information to identify these phases and you can't really read this so well, I know. But so the time is along the Y axis. So earliest writes are on the top, later writes are on the bottom. And then I tried to correlate sort of where in this call graph or like what function call was entered while some of these block writes were made. And very clearly I found regions of block writes. So the whole, the bunch you see on the bottom that are 512 bytes are reading from the SD card and writing that into memory. And I was able to break down this execution flow or successful education flow into a sequence of phases that were defined by what the boot loader was doing, bookkeeping, patching or loading and how the memory regions changed over time. And I didn't write that the memory regions into this. So you only see the phases. But if we consider the left side of this bipartite graph is the phases on the right side are memory regions and an edge between them means it's writable. So while it's starting, there's only a few things that are writable. The initial stack, some data region, some global data and some registers. It cannot write to the BSS. It cannot write to the heap because those don't exist. So we can enforce this up front. And so just as some final thoughts, regions seem to be a nice policy sweet spot. In terms of granularity, and they seem to allow me to intuitively encode the behavior of a boot loader. I'm hoping to apply it this further, not necessarily as a policy, but as a way of understanding software. So I'm working on the DARPA safe box program and I'm hoping to apply some of these ideas and I'm working to apply some of these ideas in parsers. And just as some related work, ArtBack is type, you can kind of look up that. You can look up ArtBack and my instrumentation. And yeah, thank you. So there's a picture of my kitty. I didn't expect her to interrupt the presentation. So safe for the end. Thank you. Thank you so much, Bex. That was very interesting. I'm always really impressed when people take the time to master such fundamental concepts. So we're gonna give two or three minutes to participants to ask questions on Slido. So you should have by now either the link or the QR code, otherwise see in the Twitch chat. And in two minutes we'll come back and ask questions if that's fine with you. All right, thanks everyone for being with us again. So let's start with the Q&A if you are ready, Bex. The first question I'm gonna ask from an anonymous viewer, what prevents type-safe systems and policies to be a solution to provable software security? In general, I guess. Oh. Or maybe applied to- I mean, they are used in verified software. I mean, dependent types. Yes, dependent types I've seen brought in to be used as part of a proof in verified software. I think that, I mean, writing a proof for verified software is hard knowing exactly what to describe, how to describe it, what to model and is tricky. And sometimes you also make assumptions that you don't know. With the types here I kind of, my thought of the way I've, or at least the way I've defined the types is to kind of use on top of what we have as sort of an extra insurance and not necessarily to be incorporated as like a part of a proof. But I think some of the components of it, type-state, or yes, type-state, which is also not, it's not my work, but very related. My, I'm not sure if it's been brought in to verified software verification, but I think there's just definitely relationships there. All right, thanks. So it's not so much easier to make formal proof, but it's harder to make mistakes, let's say. Yeah, I think there's like the behave is expected and not behave as, you know, not expected. And I think formal verification like goes towards much more behave is expected. But sometimes, I mean, you know, it's also really, you can only verify what you model. So sometimes it's helpful to have that extra insurance. Like you don't necessarily want to have a formally verified software and then have it run in a completely different environment or an environment that doesn't, you know, check, like if we'll, you know. Yeah, yeah, or it doesn't uphold the assumptions. Perfect, thanks. That answers the question. The next question, have you used formal methods such as annotating C code using Framacy in order to give it more precise semantics? What are your thoughts on this? I actually have used Framacy to analyze the bootloader or to analyze Uboot, I want to use the value propagation to understand why for every line of code or for every single address that can be written where what it might be written. Okay, didn't say that well, but I did use value analysis to try to get an understanding if we could statically know whether certain pieces of code will only write to certain areas. And I just want to see how far we could get there. One thing with Framacy is that it can't handle assembly which is, you know, exactly is a chunk of bootloader. So I had to hand write some, what's done in assembly in C for Framacy to analyze. So there's a very, there's a chance for there to be a gap. And also when you're looking in the bootloader specifically part of your address space like can be dependent on operations you do with memory map.io. You might be setting up regions of memory and so forth. And a source code analysis like doesn't necessarily have that information of regions of memory. I mean, it could, but regions of memory becoming available or becoming unavailable as the bootloader executes. Right. All right, thanks. I think that that's a very good answer. Thank you. Yeah, actually in my thesis there's more information on my Framacy analysis which you can get from typefrugins.com. I was going to ask if this work has been published. So I guess in the, at least in the PhD thesis. Yeah, I've given some talks and then there's my PhD thesis that I have available and the link was on my slides which I can also publish. Yeah, perfect. Thank you. So the last question, it's a bit more general so feel free to maybe shortcut it a little bit. So if almost anything can be defined as a type like variable function classes is there any concept in CS in computer science that can be defined as a type? Is there any concept that can be defined as a type? That cannot be defined as a type. That cannot be. Is it universal, I guess? So the thing with types is they can be like dependent types for example, can be quite powerful. And for, in, okay. So for type system, at least in programming languages for it to be truly useful, just like generally you want the type checking to happen or to be able to happen before execution. So not all types can be expressed or can be checked like period in terms of just computability. And so I think a lot of the initial work was for computable types and computable type checking. And we have been, I think not we cause I'm not totally in the theoretical end but there's definitely been a lot of work incorporating types that are not necessarily computable or at least type rules that are not computable. And that you'll see that in more of the verification sort of proof checking engines. That's where to look for those more complex ones. Right. So it's a bit out of scope maybe for a security conference. Well, it's still interesting. I think it's not totally out of scope but it's just a different focus. Yeah, perfect. So thank you very much for your time. It was really appreciated. Thank you. And now we're gonna have a five minute break and then we're gonna have the next talk by Vitor Ventura and Paul Raskangares about high-speed fingerprint cloning myth or reality. Thank you.