 Okay, thank you for coming today. Our talk is on a hardware tolerant hardware that's tolerant on Trojans and we're going to talk a little bit more on that and supply chain security in practice. So the highlights of this talk is we start first discussing the private life of keys. There are also some weak links on this supply chain we're going to bring this up and then we have some lessons learned from airplanes and then we'll see how we can transfer this to crypto hardware where Dan is going to do our demo and describe the architecture we came up with and then I'll describe the protocols so much and some magic stuff we do and finally we'll close the presentation with some talk about politics and how we can explore this crypto hardware. To begin with let's just think that we have a private key and a public keeper how do we generate that? So first somewhere in a development design house designs an integrated circuit which is then fabricated to a fund tree somewhere else probably and then this integrated circuit is delivered to the hardware vendor that actually ordered it. The vendor then loads its firmware on it and assembles the actual device that's going to use it in the integrated circuit and then the device is sent to the customer that bought it, the customer uses the device to generate and store the key on it. The problem with this, the lifecycle of keys is that in practice if any of these steps gets compromised or an attack happens on them the final key is weak or compromised completely and this is, for this reason we have hardware security modules so hardware security modules were built with the purpose of protecting keys and performing all the operations on the device so they provide some very neat features some of them are cryptographic key generation storage and management and all those things happen on the device and then they have a whole set of features that have to do with tamper proof, tamper resistance and tamper response so what they do is basically if you physically manipulate an HSM then this is going to be visible to the owner of it also manipulating physically an HSM to retrieve the secrets that are stored inside is not trivial so they provide tamper resistance and then tamper response means that the actual HSM actually the HSM can take action if it detects that it's being manipulated so it may raise the keys or lock down completely so this makes it very hard for the adversary to retrieve what's in there plus for companies that want to check all the boxes the HSMs are usually certified with a very high level of security and validated for it the bottom line for HSMs is that all the operations are being carried out on the device so the secret keys and all the secrets stored inside never have to leave the device and for this reason because they provide very high security they are used in lots of applications where higher service is needed so like public key infrastructures SSL connection accelerators payment systems it's very popular and actually vendors are willing to pay the very high cost that comes with them 10K that we have here on the slide is a very very low end of it usually it's much higher and then there are lots of other costs that usually are hard to quantify but they are quite high that you have to do with the integrated the actual HSM once you buy it and then operating it and supporting throughout the years despite the high cost HSMs protect only the last two steps on the private key life cycle so the top four are also still exposed and there have been cases where we've seen things going wrong and the first four steps meaning that the actual security of the device is being completely broken and non-existent for this reason and because people know that they try to come up with solutions and there's lots of academic literature on it actually the most popular of those are trusted funders so this means that you send your circuit design to a fundery or factory that you completely trust not to insert any trillions in it the problem with it is that it's very expensive and of course mistakes can still happen during fabrication the other approach is more academic it's split manufacturing there should be a few funders that support that it's still expensive and again errors may happen and the final one is post fabrication inspection so what happens is that you order your integrated circuit they manufacture it you get it back and then you run some tests on it the problem with this is that it's expensive you need expensive tools to do that you need to constantly retool you need advanced techniques and then it's a huge pain because if you order a few thousand chips you cannot test all of them so it doesn't scale very well in general overall it's an arms race because hardware trojan techniques are constantly advancing and adversaries are always and will be always a step forward so you can never be 100% sure that nothing went wrong throughout the process even your trusted fundery may sometime betray you and cooperate with someone so on another note there is another community default tolerant community so not security that they had a similar problem and they solved it using redundant systems so what they do is basically instead of using one integrated circuit they use three coming from completely different supply chains and they build either dual redundant systems which allows them to detect if one of those two circuits is misbehaving and detect errors for the final results or triple redundant systems where all the computations are being replicated between the three different processors and in the end they perform a majority vote about what the correct output is and this is actually used in auto pilots on commercial aircrafts and I think also they use it in space the problem with the fault tolerant systems is that they are built for safety and they do their job very well because they replicate the computations but for security they don't transfer well at all actually they are bad for security because what you end up having is a system that has three processors storing your secret key meaning that if one of your processors is compromised then you are prone to attacks so instead of actually improving your security you increase your attack surface for this reason we came up with the solution we are going to present today which provides protection also on the first steps of this life cycle of keys I am Vasiliis I did this work with George Danesis Dan Zferzeck and Perdes Venta and here are the ingredients of our solution so we have two ingredients the hardware components and the second is cryptographic protocols and we need specific things from these components so for ICs we need the independent fabrication so they must be fabricated in different facilities and their supply chain should be non-overlapping they must be programmable hopefully affordable and if they are commercial of the self that's actually even better but for cryptographic protocols we want protocols that all the parties that participate in them are not trusted the secrets are completely distributed and allow them to perform operations in a distributed manner instead of a centralized one and they are provably secure meaning that there are math that support their security so our hardware components are smart cards because you have many independent manufacturers and facilities to produce smart cards and the supply chains are indeed disjoint both in terms of locations, design and fundraise they are programmable and they certify to very high standards and they are commercial of the self and pretty cheap actually and then for the protocols we have multi-party computation protocols which allows you to do distributed operations meaning that the key is not on a single point and you can generate random numbers in a distributed manner do keeper generation which is what we are interested in decryption and assigning which is what we are also interested in the two nice properties of those protocols they allow you to be secure in cases where all but one your components are malicious and they cooperate with each other or they allow you to be secure in cases where all your components are malicious but they don't cooperate so now Dan is going to take over he is going to introduce our prototype and then move on with our actual demos all right thank you so with the help of Jack I will try to show you some live demo and what was so far pretty much slight where I will try to turn it into real product and the real product took about this and we got one prototype or one piece here on the table and I will try to use it to show the multi-party computation security that we designed actually works so what's inside the box we got many smart cards this particular one got 120 of them and we will use them in groups of three to basically show some kind of scalability and properties of protocols that we designed probably you say while smart cards is pretty slow cheap device well we can talk to them directly about over one megabit per second so in that box basically we are talking more than 120 megabits per second to smart cards so I don't think 120 megabits is really that slow even today there is some FPGA to connect all the pins together and those boards are connected with standard to some internal hub into main internal motherboard and then just put it to a rack and use it in scale so here's just the main the main parts so 120 smart cards use java cards because they are easy to program and we did some development for java cards so we can use really very easy java cards from different manufacturers something we presented earlier at Black Hat so each smart card gives you physical security very good several layers of physical security very very difficult to get inside to extract any keys among other things all the memory and addressing in the smart card is encrypted so just deleting one AS key basically destroys all the information and makes it basically random data we got FPGA that basically connects java cards and we use serial protocol to talk to smart cards into basically TCP packets and then we got internal network hub and main Linux server that runs for us basically untrusted RESTful server that allows connectivity to outside so we got three demonstrations so I will try the first one which is about showing geographically distributed control of integrated circuits what I will use is my laptop I will use a black box that is next to me there's got 120 smart cards inside and runs the RESTful server I will talk to but I will also try to connect to another set of 120 smart cards that are just now sitting in our Cambridge office in the UK so if everything goes well I will load one of the dashboards so at the moment it's all green already because we don't have any data here yet let's start the glue that connects the RESTful server to the RESTful server and now let's switch the pond now you can see that what I actually did is I started the server with configuration that shows two IP addresses that hide two sets of smart cards one is local address and the server the other one goes all the way to the UK through some commercial ISP this is an emigration of the smart cards and what's happening now is basically the RESTful server should be starting any second so what is happening basically the server will try to connect to all those smart cards we take it slow and then elsewhere still not started yeah, here we go so it took a while but at the end we got there so now it got basically available 240 processors that are each able to basically run multiparty computation for us but what I think about the system as we designed it is that not only you can use microcontrollers different geographic locations but each multiparty computation each group can contain processors that are in different physical locations so you can run a group that's got one processor here in the room another processor in Cambridge another one actually can be run as java card simulator and as such on any platform as it arm Intel on Spark white range of options to provide different supply chains and complete independent manufacturing processes so basically it's the only common point when you start using crypto generate keys is your laptop when you actually start to generate a key so I'll quickly switch point off so it should be much much quicker now I got basically just local local address for the board in here we'll change different by the board we'll throughput some things already and what I'm going to do now is use my laptop as a load generator and start running requests against the smart cards so what I'm going to do is basically create 30 independent groups of smart cards with 30 different keys that can basically serve 30 different customers at the same time so instance has been allocated and now basically this is transactions per second to run for about a minute if I just sort of a bit of context you imagine that you use bit chain or blockchain technology and you got 5-10 parties in a private scheme and basically each transaction needs 10 signatures by 10 parties what you can do with this computation is basically involve all those 10 parties just one signature but you know that the signature needed cooperation of all of them so it's much easier to verify signatures because instead of going to 10 different ledgers you got just one master copy that you can replicate and verify independently right so this is basically just demonstration of the throughput of the whole system and the scalability and to us this will show a bit more graphs about our previous tests that we did so the last demo is actually showing how someone can try to attack the whole system but he needs to put a lot of effort into it to actually succeed so again just one server that is running here laptop connecting to it and I will have a small group of smart cards that got backdoor inside and the attack will try to use the rest for server something that is facing the internet and will try to set the key that I will use to some kind of default value so it can easily decrypt the data so again I'll turn the dashboard to something new so I'm using dashboard and also no threads to basically connect all the flows and show you something meaningful oh it's bigger than I thought anyway one of the main flows one is to generate key I don't know how well it is it is visible so this is to generate the key and then I got three triggers that will allow me to compromise card 1, 2 or 3 how it happens is each time I'll need to create a restful request that will allow open the backdoor that will sympathize the chip the key is all the value that I expect so I can attack the whole system the first one will be so let's initialize the dashboard so we got in sort of initial state so now there are three cards used because we did some experiments before all is green but you see that there is no key and there is no group that we can use so let's run create as a result I got now three addresses identificators for integrated circuits that I will use now it requires a little bit of clicking what I'm going to do is basically do what the attacker will do figure out which are the processors that he wants to compromise so this is the first one done so this is the second one the last one almost there so last bit I need is to tell the key generator algorithm which group of cards it should be using so confirm deploy make sure that we've got all cards as they are as they are secure now we try to generate new key we could some delays here quickly so you see that there is no public key and it's definitely different than the fixed key that we know the attacker set and wants to want the chips to share so the first one is to compromise card one so inject the backdoor to active so the card is compromised takes few seconds and now basically the new but you can see that it's still green it's still secure imagine that basically to get this the attacker either have to change the firmware that can be verified by us or controlled by different parties or have to compromise manufacturing of the chips anywhere during the manufacturing process so if I do part cards 2 so compromised now imagine that all those keys they might be curved to 5-6 bit keys and if at least one part is secure we still got 1256 bits of random data random key so second attempt still doesn't succeed I'll try finally the third card now basically expect as soon as the attacker compromise three different places three different ships that can be under control of three different parties manufactured or running on different hardware when it does all three of them you can see that the key is as expected and now we can basically decrypt all the data that we try to encrypt with a key or forge our signature on the other hand if I expect that this can happen and regularly try to refresh the chips that I use if I refresh just one of them and turn it into secure state then I again get key that is absolutely secure from cryptographic point of view all right so that was a bit of live demo and Vastavis will tell you what is actually doing inside ok thank you again so yes we built that system however so for our demos we used a group of cards that had three cards inside to do all the computations however someone for whatever reason he may want to use less or more cards so we try to optimize our protocols to be scalable so for assigning a decryption we do super well this means that you can use as many cards as you want and the processing time doesn't increase and the processing time preparation for key generation because we need very high assurance this is not the case but as you can see it increases linearly so it's nicely it's not that you get a devastating delay for scalability on our hardware here we used 120 or 240 if you use hardware remotely you can add as many processors as you want as you can see both operations decryptions and signing the throughput increases linearly so the more you add the faster you become so depending on your needs you can decide how many processors you want to use so a little bit more about the magic that's going on behind the scenes I kept it extremely light in terms of mathematics so there is nothing there so there are three plus one key points that we wanted for itself of the algorithm we used so the first one is that there must be no single processor handling sensitive stuff such as secret keys or anything else at any time the second one is that if the processor is misbehaving and is trying to actively attack other processors or trick them into doing stuff honest ones, honest processors can detect that the third one is that if one of the honest processors is being excluded from the protocol execution the user can actually tell that this happened and finally if we could come up with a protocol or an algorithm that is doing well in terms of performance which we did a little bit of a side note secret sharing is a very neat concept so imagine you have three people and they want to share it a treasure map the simple solution would be so that they can retrieve the treasure only if all of them get together again otherwise no one can actually retrieve the treasure map so the naive solution would be to actually cut the treasure map in three pieces and then each one of them gets one of the pieces however there is a problem with this because each piece is leaking a part of the information of where the treasure is and someone may be able to successfully use this information to retrieve the treasure by himself so there are some schemes called secret sharing schemes that they allow you to split the secret into shares and then they allow you to recombine the shares to retrieve the original secret to reconstruct it but they have this very neat property that allows that each share doesn't leak any information about the actual secret so as long as not all shares are present you learn nothing about the actual secret that they are hiding and then there are two parameters that the user can actually choose to tweak one is how many shares you split the secret into and the second one is how many of those shares you need to reconstruct the original secret so you may actually cut the treasure map into 100 shares and you need only 3 of them to reconstruct the original map but in this case for our hardware we use a 3 out of 3 scheme so this means that we split our secrets into 3 and then you need 3 processors to come together to reconstruct them ok so here are the operations classic key generation you go to the HSM you inquire a new private key for yourself if the HSM response he generates a key internally stores a private key inside returns to you the public key the problem is that if the HSM has a malicious processor this means that the processor gains full access to the private key and then the public key that you are getting back you have no idea if this is some sort of weak public key or there is any other problem with that instead what we do is something different so we have 3 processors as you can see the bottom of this slide also on the step 1 you inquire them to generate a public private key pair so they generate the public keys and then through a process they combine them to form the common public key that you can see on step 4 so what's interesting about the common public key is that despite the fact that all the keys except one may be compromised as long as there is one key that's strong the final key is so going back to the key points that we evaluate algorithms with we have all of them except the one that accounts for performance and this is because on the step 3 there is some interaction between the processors so this means that there is some slowness there we've seen it previously on the on the graph where we showed how it scales so then we have decryption pretty similar process you go to the HSM you say I want to decrypt this email HSM knows your key so it decrypts the email for you it tends to use the plain text again the same problem the HSM needs to have in a single place your full key instead we do something else we do distributed decryption so on step 1 you inquire from the HSM to decrypt your cipher text what happens is on the second step the different processors generate what we call decryption shares so the decryption shares is called the plain text they adjust shares that then they send to Bob and Bob can then combine them by himself to retrieve the plain text so there is an added benefit for that because the HSM never sees the actual plain text all the decryption the decryption process happens actually by the user himself again we have all the key points checked except there is misbehaving a processor one because there is no interaction between the processors in this protocol it makes no sense I mean if you misbehave then the plain text in the end will make no sense and essentially the hardware will reveal its existence and the user will know that something is going on wrong with his hardware then we have classic signing same process you supply plain text and you say you want to sign the plain text the IC again needs full access to your private key and then it returns you the signature for the document you provided instead what we do is a bit different so there is a first step or step 0 which is caching so when you set up the device you do some caching you do this once for thousands of signatures so it takes about a few minutes and you do this only once in the lifetime of the user and then you move on with the actual protocol so this is what you execute when you want to sign the document what you do is you send your document to the HSM they generate the signature shares which are then on step 3 returned to the user and the user combines them to retrieve the signature for the document Again we are very efficient and at no point no one learns the full private key of the user However so far we've discussed cases where you have only three processors interacting but our hardware uses many more so we had the problem of how to make that thing scale and by scaling is basically adding more groups of cards so we ended up adding 40 groups of cards and then we had this key this problem with a key that yes its card, its group of cards can generate its own public private keeper the problem is that how we can make that key be consistent with what's the public key of Bob so can all those groups serve requests coming from Bob and we have to come up with another protocol that does that and this is what we call key replication so the naive way to do that is you have let's say group A and group B group A has the key for Bob and then the processors inside the group share the key shares with the processors inside the group B on a one to one basis so this looks fine however what happens is that if processors A1 B2 and A3 which we can see here as malicious collude or cooperate then they can retrieve the actual secret so this is very bad and we don't want this to happen so this is not clearly this is not the right way to do key replication what you do instead is you split each of the processors is splitting its secret to three secrets and it distributes them to the processors of group B and then this is how you do that in a secure way it's pretty easy from a mathematical perspective but we don't need to know that what's important is that both by the end of the protocol both groups A and B or whichever other groups you may have we have 40 here you can have more can serve requests for the same public key that belongs to Bob or whoever else so now the politics part of the of the talk so initially so far we've been saying that our system provides security as long as there is at least one processor in the system so you can have many malicious ones but if there is a malicious one honest then you are okay however this is not always the case or you cannot always be sure that not all your components are a backdoor and to be to be accurate the adversaries that are capable of introducing hardware backdoors or trojan horses are mainly governments and because they have access to deep access fabrication facilities they use very sophisticated techniques and they are trojans and the techniques are very hard to detect and if you detect usually you are not sure if it's an error or a bug or a manufacturing mistake or an actually malicious act however they are very secretive and all those things are highly classified and there is no chance that they will share the details of their backdoors with anyone and we were thinking if we could exploit this somehow so what this entails is that they are unlikely to collude or cooperate with any other adversary so if you remember the mpc protocols multipart computation protocols provide security guarantees against another class of adversaries adversaries that are all malicious but they don't cooperate so in this case what you can do is basically you can buy processor from the US and another one from China and another one from Russia that they are fabricated there and then even if all of them are backdoored you can be certain that they will never cooperate and they will never reveal the backdoors one to the other so you are safe and secure despite your hardware being super compromised so concluding this talk we introduced the hardware that can tolerate fault in malicious hardware we have decent performance it was scaled nicely so you can move this you can serve as many requests as you need we use all the self components and this is neat so you will be later how this is very nice and all the techniques that we have discussed about trusted foundries, split manufacturing these kind of things you can of course source components that manufacturers securely to increase the security of the system we are not competing with them we can actually use them so what's interesting is that yes we took it to an extreme but what you can do is you can build your own hardware tolerant device it's pretty easy you can buy USB hub and a few card readers depending on how many processors you want to have you can download our MPC applet buy some cards from different countries do your research there where they are coming from which manufacturer has its own fabrication facility and where this is located some of them are actually providing details or you have to pay a little bit more and then they are produced in specific fabrication facilities so download your applet afterwards review the code, please do that and then yes we upload your applet in the cards and you have your homemade HSM that can serve not many requests but I don't expect that a single user will generate thousands of keys per hour or something and yeah that was it thank you