 So, why are we worrying about this? Well, security hinges on where it is. We may recognise those buildings. Those organisations are very concerned that some of their operations are making them security. Conversely, some of us are very concerned that we do our own operation very securely so they don't. On our everyday level, providers of data centres, big companies, we are concerned that the code on those servers is secure. Sorry. OK. Thank you. I'd hate to suggest that the first two organisations were responsible for those words not making it onto the tape. That's how we get security. So, databases, data centres, we want the code in there to be secure. Wrong button. And then, of course, the ubiquitous smart card, whether it's your bank details, if there's this example here, your medical details or your transport details, you don't want your bank account being broken into. The train company doesn't want you taking free rides on the train. So, at the very smallest level, as well as the great big data centres, security matters. And when it goes wrong, first of all, when someone hacks an internet toilet, it's quite funny. When you discover that Osram's latest light bulb has about a textbook description of how to get security wrong in every hole in the books there and it happens to give a nice open hole into your home network, we start worrying. But then, we get this. You'll recognise the Chinese webcam and the Myra botnet, which took over, which managed to issue some very important security attacks by taking over internet-connected webcams. And at that point, security stopped being fun or a bit of, oh my God, this really does actually matter. And it matters particularly because of what is known as the internet of things, perhaps more correctly, the internet of sensors, but the fact that increasing amounts of sensed information about your environment is connected to the internet and can leak out there. So, security becomes more and more impinging on our everyday lives. And some of the attacks will affect us. The ability to power cycle all the heating systems in the country and then take out the national grid so that no one has any power means people die on operating tables. So, why the compiler? This is, although it's the LLVM dev room, the techniques I'm talking about are entirely generic. So, I've put two well-known free and open source compilers there, LLVM and GCC, compiling to run on a computer and increasingly for internet of things that may be a very, very small computer. But why the compiler? Well, your CLC++ embedded code will go through the compiler. Even if you have something like Java, which you think of as sort of being interpreted, it's interpreted via a bitcode and something like a compiler is going to generate that bytecode. And even your assembler goes through the compiler. Almost all code goes through the compiler, the exception probably being the sort of scripting language that is source interpreted. And for those in the legacy place, people who actually write hexadecimal machine code and just put the numbers in their machine. But the point of using the compiler is security is a whole system problem. And on the whole, the compiler gets to look at almost all the code. So, it's a very good tool. If you want to see is my code secure, the compiler's quite a good place to put that functionality. So, let's have a look at some of that. There's two ways the compiler can help. First of all, it can tell you if your code looks like it's doing bad things. It's a helpful assistant. It's warning you. This looks bad. Just as the compiler will today warn you, you appear to have an argument to your function that you're not actually using. Did you mean this? So, there's that warning role where we can help. And secondly, there's the heavy lifting role. There are some very well-known and established secure programming techniques that don't get widely used because they're a right royal pain to actually code up. And the compiler can help by hiding some of that for you. So, we've talked about this before. My colleague Simon Cook, who's in the back of the room over there, spoke in the LLVM developers meeting about using LLVM to guarantee program integrity. And that's all part of the security thing. You can only re-talk about the security of your program if you have some confidence about its integrity. And that was really dressing how to deal with specialist hardware. Hardware that provides some sort of support for instruction integrity and control flow integrity. And in order to do that, the compiler needs to understand that and write code that matches that. And he talked about the attribute protected that we've added for a production compiler for one customer that does exactly that. I'm not going to repeat that because you can watch that on the LLVM dev meeting videos. Some of the things that we've already done are very simple. Here is my security program, at which it becomes clear I'm not actually a security programmer. I've got a function called mangle, which obfuscates my secure key by reversing all its bytes, all its nibbles. So, there it is. I give it an argument k and it returns a value with all the nibbles swapped around. And that's great because then I can throw away k and no one will guess what my original key was. So, let's have a look at how that behaves. Well, there we are in the main program. I've given it a string as an argument, which will convert into a value using a to i and pass it to my argument and I've got some arguments. So, let's have a look what happens next. So, we go in and we call our function and there's our key because that happens to be dead beef in decimal. And initial value of res is zero. And then let's go round the loop and get to the end of that function and return. And so, we've inverted all the values and we've returned it and so, k and our output there from the mangle function is dead beef spelt backwards. And that's it. We're back in main, we've finished our program and we've thrown away our key. Except we haven't because we've left it lying on a bit of dead stack. Now, when we next call a function, it'll get possibly wiped out. But it's sitting there lying in memory. And if I do something evil, which on this particular architecture is taking the address of my local variable and looking a couple of words further on, learn behold, I can suck out my original key. Now, the simple thing we can do is something like this. Let's have a new attribute called erase stack. An erase stack adds to the epilogue of any function a small piece of code which just writes zero over the entire stack frame. So now when I return, I can help myself to the next area. It's just gone being zeroed. And of course, you can then generalise that. It doesn't have to be zero. Let's give it an attribute with an argument and we can turn it all into alternating bits. And indeed, all of those are fine. You'll just find various things going on. You can even go one further and have a randomized stack thing which just writes random values on the stack. That has what may be a benefit in that if you've got the sort of code that's roaming around off the end of the stack, it's probably going to behave erratically because it's going to be picking up random variables. That may be a good thing, or you may be in the category that says I don't like debugging programmes where they deliberately behave differently every time I try and look at them. You take your choice there. That's a straightforward thing. That comes in the category of something we've done. It's not currently in a public compiler. One of the things that we do is we work for a lot of customers, writing them LLVM and GCC compilers for chips that are not yet out there and therefore they don't want us to talk about them or go public because they don't want to spoil their big hit when it all comes out. So this exists for a processor that if I told you about it, I'd have to shoot you all. Now that's been recorded as well. So let's go a bit further. So that was about solving one particular function. What about set jump and long jump? So here we are, I have my top function and it calls a load of functions and it goes through something called middle function which happens to have a key in it. Because all my keys are called K, 32-bit key and eventually it gets to the bottom where it's got its jump buffer and it does a long jump back up to the top. So here we are, I have my top function and it calls a load of functions and it does a long jump back up to the top. So here we are, we're in farfunk. All's going well and there's my cryptographic key sitting in my middle function and then I return and now I've still left my function in the middle. Now unless all my functions have been labelled with the attribute clear stack or randomized stack, erase stack or randomized stack I'm not going to get rid of that and indeed the way set jump works it does typically work by just going back to where you started from it's not going to unwind the stack anyway so none of that's going to work very cleanly. However since we know where we came from and where we're going to in long jump there's no reason in principle why long jump couldn't wipe out its entire stack. Now we could do it by having a special long jump function long jump secure that added that code or we could have minus F erase stack which just says whatever the function or even if it's a long jump function erase the stack as you return so you never leave stuff lying around on the stack or you could erase it to a particular value or you could erase it to a random value and that is all fairly straightforward we've done most of that we haven't actually done the last bit because the customer we were working for wasn't terribly concerned about long jump because their coding standard means they don't use long jump not because they don't care about leaving stuff on the stack for long jump so in summary we've got two attributes one in two variations to erase or randomise the stack on return or we can do it to whole program level which also sorts long jump with erase stack, erase stack equals n or randomise stack that's all simple and not difficult to do but it is a harder problem right we have a customer who is really worried about security one of the ways that you can attack a chip is you can take its top off and you can shine laser light on it very high laser light and every so often you're lucky and it changes a value in memory and you can observe how the chip behaves and that tells you stuff about the chip and indeed allows you to break its security so you see strange things like this where I set a global variable to a value I go and do some computation and I go and test that the global value hasn't changed even though patently it wouldn't have normally and of course the compiler understands that now if I compile with O0 you can see the test there's a little branch in there afterwards where it checks that the thing it's set before hasn't changed from what it's set it but of course as soon as I turn on optimisation the compiler says this is a stupid test I know it's that value I don't have to do it again and the optimiser has carefully taken away my clever bit of secure code now I can sort of fix that by saying well glob var is actually a volatile some sort of dynamic register so it makes sense to read it again and then if I compile I will get my branch back the problem is I've now made my glob var volatile and if that's the one place I care about I've gone and hammered performance across my whole program for the sake of getting that one right the question is and this is where this becomes not what we've done but what we'd like to do actually how do we do that do we have to go down the route of pragmas around code to say this particular fragment of code mustn't be optimised away I hate pragmas they're not syntactically particularly rich but I don't know a better way of doing it if anyone's got a better idea yeah yeah okay yeah yes I've got three better ideas the Linux kernel macro is goodness that casts a pointer to a volatile pointer and then the references which has the desired effect they're used for some concurrent programming they need to say this might be changed behind your back by another grid excellent okay so for the audio that's the point that Linux has macros which hide a cast to a volatile and back again so it's just on the local stuff so that's a good approach we could access once okay that's useful I wasn't familiar with that that's the sort of thing that we could potentially do there okay we're already getting value out of this meeting this is intended to be a two way discussion this because it's the start of the project if you come and tell me that when we've finished the project he'll be disappointing so let me tell you about this project this is an academic project it's funded by the British Engineering and Physical Sciences Research Council it's a four year project funded by Professor Elizabeth Oswald and Dr Dan Page at the University of Bristol who for a long time have run their the computer science department's unit looking at information leakage four year it's got a team of RAs and PhD students and Paolo wave your hand Paolo's going to be one of them so he'll be able to tell you about it in the future there are still vacancies in fact this is a field that interests you going at all levels I understand Ember Cosm is what's called an industrial supporter it doesn't mean we get any of the money but it means we've written to the government and said you really ought to fund this research project because it matters to industry and furthermore we're going to be employing people to do this and so forth and we really care about it and one of the ways we're supporting it ourselves is we're taking the PhD students who work on that project some of them will come into us for the internships in the summer to do the information transfer thing expose them to real compilers and so forth and we have a role beyond that where some of the time spent not so much by me but people like Simon and Ed and Andrew Burgess and Graham Markle is on actually implementing some of the things I'm talking about and implementing them in a general way to get them into the open source community for mainstream processors not for top secret systems how many people here would think they know what information leakage is right good thing I haven't wasted these sides in so Wikipedia actually is quite a good definition which is what happens when you have something that's supposed to be a closed or secure system and an eavesdropper can learn stuff about it and an example from the Second World War is the Japanese used secure radio transmissions which the Americans hadn't cracked to talk to their warships however they'd always transmit from a different station depending on what they were doing and the Americans were able to learn what was going to happen from which station it was that was transmitting the information and that's an example of information leakage a technique that's very popular made popular about ten years ago by Simon Moore and his team at Cambridge University is differential power analysis so power analysis is about looking at how much energy a computer uses differential power analysis is about looking at how that changes into different circumstances and one of the things you can do is to look at how power changes as I try different ways of encrypting or different values for encryption and decryption and learn stuff about the encryption algorithm and so forth is a very simple algorithm I've got which takes my key and if it's an odd key it subtracts one from it and if it's an even key it takes the square root of the key and clearly one of those is an easy operation and the other is an expensive operation and if we run that we can see that if I do my program and apply it to seven I use about 25 microseconds of time on my laptop and if I do it with an even number I do it, it takes 86 seconds and I could explore odd numbers and I'd come to the conclusion that this algorithm did something different for even numbers than for odd numbers so that's a very simple example of differential power analysis and what is it that was the mistake it was fundamentally this we took our critical variable, our key and we let it control where the flow went it's in an if statement and now anyone who writes secure code knows that if you're worried about security you don't have your critical variables controlling branches or loops and that's easy to do but if it's a big program and it's tired and you're tired or it was someone's birthday and you had a couple of pints at lunchtime you might have got it wrong so how can we help and we don't need to just worry about data dependent control flow we need to worry about data dependent instruction timing you may think there's no change in the control flow but if the value affects how the instruction times that'll affect it or even data dependent memory access well if it's this value I go to RAM and if it's this value I go to flash and flash and RAM do not have same energy profiles so here's my function of concern one option is this which is to say my argument K has an attribute which I've called crit var I should have said we're now in the territory of stuff we want to do not that we have done so this is where feedback is welcome well now hopefully I'll be able to get a warning that says hold on you've got a critical variable controlling a flow because I can see it's part of the expression controlling this if statement so that seems to be okay and even if I put my critical variable in a header because I'm including that header I've still got all the information on the work the critical variable usage is pretty obvious in those circumstances but it's not quite so simple what about this not so simple case because I actually assign the bottom byte of K to B and then I look at B well yes I probably ought to be able to get that right and spot that B although it's not the actual critical variable is directly connected to the critical variable and it's still a bad thing to do but to do that I need to understand the local data flow okay so this is not something we're going to solve by a simple look at the program we've got to understand the proper data flow through the program now what about this case here I've got two source files one is funk 1 which has an argument that's a critical variable and it calls funk 2 passing in K but funk 2 has nothing to say that its argument is a critical variable and it might be used in lots of places some of which are not critical variables so I go and compile it I compile each and then I link them together and I don't get any warnings because individually neither of those compilations can see a critical variables being used to control flow and so the only solution I have is to go and use link time optimisation so this is something that has to go in an LTO world and indeed yes so this is using runtime data flow tagging and we haven't explored that so it's on the tape another one to take away and look at in the investigation and Paolo is listing over there so I want for him to look at so yes clearly if I put funk 2 had the attribute that's fine it might be like a power of 2 that you happen to need power of 2 for your key but not for everything else and then how do you do this you can specialize you can do like const analysis this is a const thing you calling a non-const function try to do some casting so exactly the point being made for the tape was funk 2 might be used multiple times so just making it a critical variable all the time may be inefficient if it's used widely in a non-critical context so the point is that critical variable usage needs to understand global data flow and that's generally true of security and its sister discipline of safety which is two sides of the same coin that generally there are whole system problems and you need to look at the whole system so data flow or run time instrumentation those sort of techniques are going to be what we need so in summary on this one approach of critical variables simple cases are easy most cases need data flow analysis and that probably means LTO or as we've heard run time analysis for programs of any size now I have this question what if we get it wrong if I tell you this is controlling a critical variable and it turns out actually I've made a mess of the data flow and I've been too cautious I'm going to spend ages tearing my hair out where the hell is this wrong it's not wrong the compiler has actually not understood what the program is doing false negatives means some bad code may be getting through but if I didn't do this at all all the bad code would be getting through so I think we need to aim on the air of false negatives are less of a problem on false positives false positives could waste an awful lot of time it depends on the case if this is a very security sensitive thing then you want all the false negatives all the false positives yes I agree this is my toy stuff then I don't really care about any of that that's a good point to make but certainly for the immediate term I think we're going to try air on the sector of course the idea is never to get it wrong but we might not get there first time and it's worth making a round as I said before it's not just a control flow that leaks variation in memory access variation in instruction time I mentioned and don't forget energy now some of you know from previous talks we've done a trellur at Mendelsoff energy work on the compiler's impact on energy consumption here's a graph from James Palaster who spoke here a couple of years ago this is a well-known 8-bit processor where we multiplied every pair of 8-bit numbers using their multiplier instruction and measured how much energy it used from this you can see they've implemented a 4-bit booth multiplier I'm sure they didn't realise that their very long was quite so available to everyone if I'm writing secure code using multiply even just using the multiplier instruction the values going in are going to leak information if anyone knows why 1x19 is the most energy efficient multiplier I'll be interested so that was about the first category warning you about things that look bad this is about the heavy lifting and the heavy lifting the heavy lifting is about making life easier here is a technique called bit splitting and the idea of bits, one way you can find out what someone's cryptographic key is if they put it in a nice 32-bit block in memory you slice the top off the memory chip you run your thing under a scanning electron microscope and you read out the memory values and you look for all 32-bit sequences and look for anything that might be a key one way to solve that is to put little bits here there everywhere all over the place and then it's much harder and that's called bit splitting now here I've just done byte splitting I've taken my 32-bit key I've put it in four variables and I'll make sure my linker scripts put those four local variables in completely different places and then of course I now need a function to add one to that where I add it to the least significant byte and if that overflows I add it to the next significant byte and you can see just that it's tedious and that's just bytes the idea is you spread the key through memory so it can't be scanned for that ought to work but it is hard work I'm just adding one look at the code I need it and the problem is optimising compilers are very good at spotting those patterns and they say gosh this is really clever I could combine this into one 32-bit variable I'll save it somewhere so you may have got your four bits and then there's another bit of memory that has your 32 bits joined together the brilliance of the compiler completely scuppers us again so one of the suggested approaches is this is we give yourself an attribute bit split and then you can just write k++ and the compiler will worry about that and the compiler can do the nine numbingly tedious one if you want of putting every single bit in a different location these techniques don't do anything for performance but that's not the point it's trivial for the programmer the compiler can actually do a really good job it's all too easy when you write those complicated things to either make a mistake or more importantly accidentally to still leak information because you actually kept intermediate values around that were a bit leaky because it's inside the compiler we can make sure the compiler knows if it's labelled bit split don't go and stick it together and it's still a whole system problem but we've got to do it on a global basis oh I thought there was one more side there so that is definitely on our radar as an early one to do is to implement automatic bit splitting so you can write secure code and have it come out it has the same issues of is it always possible does it get hard and so forth but I think that one probably we can solve big engine little engine it's got a whole camp we are planning to try and do this for a range of architectures not everything for every architecture but to prove it across architecture and indeed given the audits be a bit careful to do it for GCC as well as for LLVM because that's a good test of have you really got a general approach well it is until you pass the argument to the library as a 32 bit value the answer is yes there's balance and the way it is often done is as a library and in some sense you're putting the library like an intrinsic inside the compiler so I think these are good things so for the record that's a say can you not do this as libraries or as some sort of intrinsic which brings the thing into library I think those are all good approaches and it may end up that this maps down into a library called intrinsic just as when I copy a struct it maps down into the intrinsic bit we have to be careful to get in line during how the other you're already calling for the other and then you get back to the original place and we start to see why this is a four year project so these are all good comments LTO inlining stuff and so forth this is not the only stuff I've touched on two because I want to give you a flavour of what we're trying to achieve but we have some others on our list and the idea is that these will be the ones that we explore we want to take the best academic research the stuff I've talked about so far has been known about for years but there's a lot of newer stuff and we want to not have that big gap between the academics new it two decades ago and people got to actually use it now we want to shorten that time scale that's the point of the link between the research and industry atomicity I've never quite understand why it's given that name this is about balancing control paths if I've got to go down two halves of an if then else can I balance the paths so they take exactly the same time hard for a modern out of order processor possible though for smaller embedded processors that you might get on a smart card superoptimisation for minimal leakage that's because we're very big into superoptimisation and all problems given to superoptimisation to see if it can solve it superoptimisation can optimise for any criteria and information leakage could be one of them algorithmic choice one way to hide what you're doing is to use different algorithms for the same job at different times and certainly for things where there are multiple algorithms that's a way to confuse data flow analysis if I had there's lots of ways of taking a square root if that program taking a square root use one of different five algorithms at random then that would spoil my energy profile a bit this is where this is verges in towards our architectural side of things instruction set extensions can we actually improve the instruction set to make minimising leakage easier now we already have one customer case where actually the instruction set has been changed because of a realisation of one particular part of the architecture was leaking more information than was desirable so there is a feedback to the development group and that's why it's always good to develop your compiler before you spin your first silicon develop the two in parallel because you can feed off each other and the last one instruction shuffling I have a whole sequence of things to do very predictable but there's always a degree of scheduling flexibility let's shuffle them around and give you several different scheduled flavours to allow you to get different profiles coming out and that's it so it's an early stage project I welcome any feedback, questions and so forth question in the middle something about you may not need to go all the way down full LTO to do some local analysis so for example I don't know if you are familiar with our build attributes but you can place the way it deals with a lot of compatibility concerns between the thousands of different arm variants you annotate the binary and say this object, this takes say for example soft load and quite this one soft load and quite and the linker can then cross a basic flag between just at the attribute level and say these things don't match after any round so you might not need to go all the way down full LTO you can say it's a function that will be a function so this is the suggestion that we don't need the full LTO level we can use things like the arm build attributes to annotate binaries to give enough information that we can get that sort of thing right a small remark unless your migrant control has no cache at all as soon as you have a balance between code commands and one goes on one cache line and the other goes on the other as soon as some other code affects one cache line you can measure again the time I mean academic so at one point you investigated this and you just executed condition in all instructions code balancing is almost impossible as soon as you go cached or out of order or any of those a lot of the very smallest things when you're looking at the internet of things the smallest those processes have no caches or they're in order uncached machines but I agree with you once you go into that you're sort of losing information all over the place there so you're solving the same problem yes for the record the question was about how cached architectures balancing is not going to work from a hardware perspective often the technology to improve are jittering or masking or other variables you want to look into that also or is this something unique to the hardware because the areas we tend to work in which are deeply embedded and we're often pre-silicon we're very interested in that it's not a primary goal for us to look at architectures directly from this project but the potential of the compiler to inform architectural decisions and engage with the hardware engineers is absolutely something you want to do so actually on the cache line question you have I assume that like the bit splitting or the byte splitting is targeted at small and better processes without the cache if not you'll introduce conditional data access actually you need more info on your key like a slightly different type of increment which caches are so this is a comment about once you have caches and you have to worry about caches then bit splitting ends up with having other defectures you're right it's aimed at the very smallest processes I think those are very open research question how do you make bit splitting work in the context of cache lines and so forth without leaking more information than you're trying to hide I'm wondering why you didn't set the attribute in the variable itself because then you only need to zero that one variable and that works with malocked stuff as well and you can first you don't need to worry about long jump set jumps you don't need to worry about erasing huge stacks some people do like huge stacks and you can also do this in the heap so the question is why don't we just erase stack on the individual variable rather than worrying about flattening the whole stack the answer is because we hadn't thought to do that when we had the conversation with the customer flattening the whole stack was interesting the feedback afterwards we're not sure we're going to use it because now we have to get our compiler revalidated because you're affecting the security of the code generated so it cuts both ways yes so the comment is on erase stack you probably want to erase the registers in fact I believe the implementation we have actually does erase the registers as well you erase the rest and then move the rest into the comment if you've got different register widths and you can end up overwriting harms those might need to somehow clear top halves of registers comment about it might get a bit harder about clearing halves of registers more questions no thank you all very much