 So welcome to my talk. Thank you for the introduction. I've done this together with my colleague Marcus Milner, Daniel Burian, Christian Kudera and Wolfgang Kastner I usually start with a single slide on our lab So we are based in Vienna with a very strong focus on hardware security and especially physical attacks So we have some pretty nice Equipment there if you're interested in cooperating with us. I would be happy to hear from you Good so we have three research fields and the last one is high-speed cryptography and this is Where this paper came out? There are two versions of the paper because we ran out of pages So the short version is the chess version and the long version is the extended version which is referenced there So if you want to read that you can easily find it Good. So what's the problem statement? Yeah, as you might know WPH to personal is pretty much omnipresent. Everybody uses it and The standard says there's the minimum password lengths is eight characters And what we currently often see is that especially like embedded devices? For instance routers cable modems or these wireless 4g access points that you can buy They use random passwords, but they are often very weak So they are just maybe eight characters long and sometimes even the character set itself is limited so this of course is not a High-quality password and the practical question is always how fast can the attacker get with with with his key Gases and how fast can he like brute force these passwords? There is also a picture here of a cable modem I will get back to that and you can see how these typical passwords Look like and this is like really widespread. So you can find that So it really boils down to the four-way Handshake of WPH to personal so whenever the station would like to connect to the access point The access point Generates a north the station as well Then we have the first Mac message from the access point to the station with the north at this point the station is It it can generate the pdk The key material then it also sends it north and this time it already includes a message integrity code At this point the access point can do the same it returns a message With its own north but this time also with the message integrity code And if everything was fine then the station returns an empty message But with a message integrity code an important thing here is that we actually know the content of this message It's empty it is zero it uses zero padding bytes So we can later on use that to determine if a password guess was correct so The actual strong thing about WPH to personal is its key derivation So we use the passphrase we use the SSID and then we perform the well-known pbk df2 function and This boils down to to a lot of shower one iteration. So it is very computational computationally complex after that We use a pseudo write on a random function with the Norse's and the Mac addresses then we get out this Ptk I mentioned earlier and the truncated version of this ptk is called kck That is then used to create all these message integrity codes by just using H make sure one So how does a practical attack look like? Well, of course, we need to capture the four-way handshake first We can for instance also start to send out the authentication frames and then capture the reauthentication So we we can easily capture the handshake and then we can start guessing passwords We need to choose a password. We derive the kck for that using all the obtained information we have from the four-way handshake and then We can then later on check if the computed mic Is the same as the one that we observed and if it is we have found the password So I mentioned that we have a high computational complexity. This is why WPH to personal is actually pretty good in terms of Password security. So if you have a good password, then also high-speed attacks won't help you against that So overall we have more than 16,390 Shall one iterations for each password guess. So this is really a lot The big question is how fast can we get so we had a look and What's currently marketed to be the world's fastest? Implementation is done by Elkom soft and Pico computing and they have an FPGA cluster that can do 1.7 million password guesses per second big question is can we beat that? Also another question is how much money do we have to invest so for instance? Solution we did a price request and their FPGA cluster costs $128,000 so that's not really in the in the range of amateurs for instance They also corrected their speed a bit to 1.9 million guesses. Okay, so much for that. So can we do better there? Of course for that we need to have a closer look at show one. It works on 512 bit chunks we get this 160 bit hash and Internally shall one has 80 rounds. There is a measure a message working schedule So at first the message is broken up and then for later iterations. We use a combination there of this is just Important to have a look here that the the functions that are used here are very easy to implement in hardware they are not costy there and The the central piece of show one is this compression function and once again We can see that most of the operations that we have in here They are ideally is used it for a hardware implementation The only expensive thing that we need to consider that we need to deal with on the right side these rectangular symbols They are 32-bit additions and in those we have to carry chain of course So this is the main limiting factor for a hardware implementation So we created an FPGA Implementation That has the following design We have like an out-of-state machine that is just used to talk to the outside world But the important thing is we have this password verifier and password generator. That's inside the FPGA and The central piece is this high performance show one pipeline and usually like I said show one has 80 rounds, so you would have an 80 stage pipeline in our case We have 83 stages due to a few optimizations We have a buffer stage to reduce the pipeline input logic delay and the problem are the additions So we split up the additions. This is possible in show one And we have an initiate in an ad stage to split up those additions a Few other optimizations we did is we compute the h make outer state always first this way We don't have to store the intermediate result then to avoid routing issues So in the pipeline if you have a very broad bus you easily get all these routing issues to avoid that We use a block run delay lines instead then The state machine is of course pretty complex and instead of having like this huge big multiplexer that will Will be very slow in the implementation We split it up in several smaller multiplexers then of course we have a custom build parameters to To leverage all the internal functions of the FPGAs and we perform the extensive for planning This is like the typical state machine that you have for for the key derivation It uses the show one Pipeline and you can just say see here that we we compute like the different states there and at the end we get out the mick So the password verifier what it does is it uses the password generator and We have we have as many cores as we can fit of the show one pipeline in a in an FPGA So the password verifier has to fill up each of those stages in each core That this is what it does and then it just waits until The computer mix are available and then later on we can make the comparison and see if the password candidate was the right one We focused on low-cost FPGAs So we did three different implementations one implementation is for the Spartan six the Alex 150 This this type of FPGA has been used for Bitcoin mining So you can you can just buy these boards on eBay because nobody uses FPGAs for Bitcoin mining anymore But for this kind of applications, they are ideal and also cheap Next thing is we use the the newer artik 7 on a development board and for comparison Proposals because Pico computing uses the Kintx device. We also use the Kintx device But we couldn't really test it because we didn't have this expensive FPGAs But we created an implementation for it and we all we have the the full results of the design suit So this is like the first implementation We managed to fit two cores at 180 megahertz on the FPGA Next thing you typically have with these things is you need to care about temperature if the temperature gets too high You get bit errors So unfortunately neither the board nor the FPGA has a temperature sensor So we implemented dynamic frequency scaling based on the error rate So as soon as we get error we clocked down the FPGA a bit and you can already see the floor planning So you see the big pipelines and in the center. There is like the state machine and the password verify a core We also managed to fit three cores on that But the we then had some some routing issues and the chief of a clock frequency was much lower So also the performance entire performance of the FPGA was lower. Yeah, we have that in an FPGA cluster It's also low cost. We bought all the boards from eBay. So this is also what amateurs can do This is the artyx implementation it looks a bit more Distributed we have eight cores also at 180 megahertz also dynamic frequency scaling so the This time the artyx has an internal temperature sensor and the term Yeah Whether the whenever the temperature is low it scales up its own frequency and the other way round So that's pretty neat and we have a star topology as well So in the center once again you have the password generator the password verifier around that We have the pipeline stages and this is the Kintex 16 cores at 216 megahertz also dynamic frequency scaling is implemented because it has an internal temperature sensor Start topology once again But like I said, we couldn't really test it because we didn't have this expensive type of FPGA. So the results Yeah, for the for this pattern six we we calculated of all FPGAs we calculated The the number of key guesses that we can do per second in theory So for the for this pattern six for a for a single FPGA That's like 20,000 something passwords per second and then we compared it to the actual measured performance And the measured performance was also 20,800 something it's a bit less because of the communication overhead that we did not consider in our calculation the the artyx seven Can do like 87,000 passwords per second And now the interesting thing for comparison purposes if we what would happen if we would use the hardware of Pico computing that can do those 1.9 million guesses per second well According to our calculations and according to the reports body design tools We get out more than 10 million guesses per second, which is five times as fast as their implementation so this is a new speed record on their own hardware and Yeah, that's the result We also did a GPU comparison just to see how well GPUs performs Used cooter hash cat. We used those well-known g-force g-force GPU graphic cards and also grid computing graphic cards You can see that on the better graphic cards You can also like achieve like 52 Thousand key guesses per second for a single graphic card But the big difference is that the price of the card is done Usually higher than these low-cost FPGAs and you have much more power consumption Whoops so this is then we also perform the real-world case study This is something that's only in the extended version of the paper because we wanted to have to have a look at the real-world impact and so we had a look at these widely Widely available UPC cable modems that you can find everywhere. So you just have these weak passwords Just eight characters always uppercase characters and they are random so we made the assumption that If you have one of these cable modems at home and it has an SSID of UPC and the six digit number And you would change the password for it to something more secure Then those people are also likely to change the SSID because who wants to have an access point at home with with such an SSID So this was the assumption and so we we started to collect some handshakes with these cable modems and we tried to to like Guess the passwords of some of these networks that we set up With our own cable modems, and it turns out that we can break the password it in three days at most Now the next question is what's the real impact of that? How many of those Wi-Fi networks are there? We use the wiggle war-driving Wi-Fi data set It's of course not complete, but it gives us an impression of how many of those networks can be found and Alone in the city of Vienna. We found more than 120,000 of these networks in the database and In in Austria and his border regions. It was 166,000 which is maybe due to the wiggle data set because in the rural areas There might be less people who do war-driving and then submit the results to the wiggle data set Yeah, so we could pick any of those networks and in at most Three days we could break into them This is a map so you can see that in the city these networks are like really everywhere So let me conclude We have a new implementation speed record if we compare it to the currently Marketer to be the fastest Implementation We showed that these professional grade brute force beats can now also be achieved by matters Because they can just use those old Bitcoin mining boards for that We also showed us FPGAs are ideally suited for this for this work and We showed that real world networks with with this weak default passwords I like not really secure meaning that you can just break into them in no more than three days with this kind of hardware Future work is we would like to support password lists as well. So right now our focus was just on random passwords It would be great to build an arctic 7 low-cost cluster to evaluate it further And it will also be great if we could try our implementation on an FPGA cluster like the Copacabana So, thank you for your attention For more information, please have a look at the extended version of the paper Thank you for attention. And if you have any questions, please ask them now