 is auto-patch efficient, verifier-based, big protocol tailored for industrial IOT, and will be given by Bjorn Hazze. Hazze is interested in critical infrastructure and the floor is yours. Joint work with my colleague, Ben Wallabric, who couldn't attend, unfortunately. So, do you see the slides? Better now? So, but now I don't. Okay, so. So, I'd like to talk today on a topic which most cryptographers tend to dislike strongly, which is passwords, passwords concept, which generates lots of problem, and this talk is about the topic in case that we are forced to accept that we can't avoid them. How should we make them, they're used at least as secure as possible, even when facing tight resource constraints? And this talk is about the system level approach that we have taken in our setting. I'm coming from a company which is providing equipment for process industry. Process industry is something like refineries, chemical plants, medic-man production, drinking water supply and we are producing equipment such as sensors, what you see on the right side, which have to work on environmental conditions and we are providing the equipment, which is also used in installations of critical infrastructure, where we think that security should really be considered. Security is a very new topic for industrial control. So, and in the first step, when considering security, people tend to focus on machine-to-machine interfaces and protocols. The human machine interfaces are often considered in a second step only. When we've been doing this in a security assessment, we came to the conclusion that actually the human machine interface is much more critical or at least as critical as the machine-to-machine interface and provides the same attack factor. And the most widespread authentication mechanism that is used today is a password. When deriving the requirements for our remote access and wireless human machine interface solution, we observed that in important settings, our customers will not be having public key infrastructure. We observed that network access to essential authentication service not always available. For instance, if you consider air-gap networks. So we needed support for offline authentication with local storage of credentials. Some devices have extremely tight resource constraints. Specifically, if some of you might recall my talk in 2017 at chess in Taipei on explosion-protected devices, devices might become physically accessible for the adversary. And we know that we shall prepare the architecture for sophisticated demands such as two-factor authentication, but we need to accept that many of our customers won't be using it at the moment. So the result of our assessment was if we are forced to work with passwords, then let's try our very best to protect our customers' installations. And we concluded that we need a combination of two elements, a very fire-based password-authenticated key exchange, VPAKE, in combination with a state of the art, memory-hard password hash. And we have looked around, and astonishingly, there was no such established solution around. So we were forced, if we want to achieve these goals to define our own protocol which is suitable in our setting. And our proposal are the two protocols, augmented composable password-authenticated connection establishment, or in short, AUK-parche, or the balanced subcomponent composable password-authenticated connection establishment, C-parche. The construction were designed to be for allowing more widespread use, so we aimed at and got the okay for our management to provide a patent-free solution which could be used in large context and possibly also in standardization. And this con-paper talk also considers preliminary results from a second review round, which is just now carried out in the context of the CFRG working group. In this talk, I'll first present the two protocols, AUK-parche and C-parche, and their security analysis. I will show a short combination with other VPAKE nominations from the CFRG and elaborate on the implementation strategy on ARM, Cortex, and 4M0 microcontrollers. The implementation used to be the fastest one, but I've frowned recently in even faster implementation where I will give the reference. So let's recall one slide from chess 2017. Many of our sensors have to work from 30 milliwatts only, which is the power that you typically have for one single LED, and that's the power for the entire sensor, and security will be granted and human machine interface might be granted 1.5 milliwatts, which is a challenge because any computation that you make there will slow down the login process, and users tend to get angry if it takes longer than four seconds. So we, in order to optimize for this constraint setting, we followed a system level approach where we did not only try to improve the assembly arithmetic, but also tried to optimize the whole protocol construction for the constraint server. So our protocol contains, it allows for fast curves. Many security proofs for PAKE protocols require prime order curves, which tend to be slower. Our approach allows for excorging only algorithms, which might be easier to implement and avoid the need of point compression. In case that we have a secure quadratic twist of the curve, our protocol allows for a simplified point verification, and the construction is designed in such a way that we don't need full transcript, hashes over the full transcripts of the protocol in order to guarantee our authentication. And we refer, one important point is, we refer the password hash to the powerful client entity. On the second level of the group operations, we have also found some improvement in order in comparison to our chess 2017 results based on a somewhat hidden result from the A24519 paper. And finally, one important aspect has been the optimization of the field operations. Our partnership is a two-party, verifier-based password authenticated key exchange protocol. So this means we have one side, which is the client PC, which might be a table PC, where we have the clear text password available. Typically we have large memory and could calculate algorithms such as S script or R12. And we have the server side, where we have not the clear text password, but the password verifier. In this context, we denote it with W. And this is typically a strongly constrain, or frequently a strongly constrain device. And the feature of the VPEG protocol is that knowledge of the password verifier does not allow for taking over the client room. We have three subcomponents within the aukparcher protocol, which is here shown in three different colors. We have the aukparcher augmentation layer. We have in the balanced PAKE subprotocol, the green part, and optionally we could allow for explicit mutual authentication of the session keys. When calculating the password verifier, it composes of two steps. In the first step, we have the memory hard password hash. In our reference implementation, we use S script for this purpose. And then we use a fixed base point with the Harmon group operation. One specific feature is that we consider the complexity of non-prime auto groups with small co-factors. For establishing of a session key, we start the protocol in the client side with the clear text password and the verifier on the service side. The augmentation layer, which precedes the balanced PAKE protocol, and consists of where you generate a key pair on the service side, which might be of ephemal or of long-term type. In the case of ephemal keys, we have the feature of full augmentation. In case of static key pairs, we have computational advantages, but realize only somewhat reduced partial augmentation features. Username and password have to be exchanged. We calculate the password. We look up the password verifier and the database and calculate the password hash and calculate a Diffy-Hellman style secret, which we call password-related string PRS, which is band-passed. So PRS is a component which we mustn't leak to an adversary because this information would allow for an offline dictionary search. So this information, this PRS, is passed over to a balanced PAKE protocol, the C-part or the green part here. And there it is used for generating an ephemeral generator for the elliptic curve group. So this is a feature which is very similar to the partial protocol, which as it is used in travel documents. We integrate also all relevant associated data that we would like to authenticate in the channel identifier field for the generator. For our reference implementation, we use Elegator 2 and SHA-512. Subsequently, we use Diffy-Hellman for generating a shared secret. We could realize a simplified point verification if these groups have a secure query decryst and the generated session keys match if the PRS-related string and the associated data match. Optionally, the session keys subsequently explicitly authenticated. One feature for this authentication is that we don't need this transcripts. For the security proof, in order to get such a protocol used more widespreadly, you need to provide also a security proof. And the security proof strategy that we use is carried out in the UC framework of Carnetti et al. In the first step, we concentrate on the green part, the balanced sub-protocol where we prove that the security features of an ideal functionality which has been defined or suggested by Carnetti et al. in 2005 correspond is indistinguishable from the security properties of our balanced sub-protocol. So we replace our real-world protocol with the ideal functionality. In the second step, we consider a functionality defined by Genter et al. in 2006 and show indistinguishability between our balanced entire protocol with this ideal functionality. So we have the conclusion that this protocol provides composable security guarantees of the ideal functionality. One specific feature is that in the functionality, the feature of explicit key authentication is not mandatory, it's optional, which this could provide an advantage when considering protocol which themselves, such as TLS, which themselves already includes session key confirmation. The security assumptions are based on the computational Diffie-Hellman problem, the complexity of the computational Diffie-Hellman problem. We assume that the discrete log of a point generated by alligator or by a map to point primitive is unknown. And we have had to assume a program or random oracle which seems to be the minimum assumption for a paid augmented page protocol in the UC framework. If an inverse map of the map to point operation is available, the security is also maintained with respect to adaptive adversaries which is something specific or uncommon for Diffie-Hellman-style protocols in the UC framework which typically are only secure with respect to static adversaries. There's one specific feature when using the UC framework, it's secure if you're having an unlimited number of concurrent sessions, but this comes at the complexity that you need to define a session ID. One could easily generate such a session ID with an additional non-surround, but in our case, we come to the conclusion that we don't need to managerially to pre-pand this session, this non-surround prior to entering the protocol, okay? Still, it is important, our opinion, to have this non-surround. It's also in line with the results of Custos, Tungatal and Rausch that we need to have a non-surround if we want to use a random oracle in combination with joint state such as we are doing in our security proof. So when comparing our protocol proposal with other decent protocols that have been presented, the most interesting are the candidates are VTBP, Franche-Vallen-Boing and OPARC, which has been proposed by Jarecki, Kraftschick and Ksu. There are other paid protocols which unfortunately don't come with an explicit security proof. Four of these protocols are currently under review at CFRG for standardization. This table here summarizes the results on the protocols which are presently, which come with the security proof. OPARC and OPARC provides stronger security guarantees than VTBP by offering pre-computation, attack resistance and universal composability. In case of OPARC, this is an optional feature. In comparison to OPARC, OPARC considers the more powerful adaptive adversary model. Regarding the pre-computation attack resistance of OPARC, it's included in the EPRN version of the paper. It's not already included in the version available at TCHES. OPARC and VTBP are monolithic constructions and merge authentication and session key generation. This requires one message less than OPARC. For OPARC, this parallelism comes at the cost of significantly larger password verifiers, even when considering point compression. So for OPARC, we are around 300 bytes and for OPARC, we end up with around 64 bytes. OPARC needs particularly little computational resources on constrained servers in the partially augmented configuration. So we only have two variable point scalar multiplications. This has been the main design target for the power constrained settings that are hard-prevented also in Type A. Unlike VTBP, OPARC and OPARC both don't monetarily require mutual authentication. This could be an advantage if integration in protocol which already has this confirmation integrated is desired. OPARC is a modular construction. We have the separation into the augmentation layer at the balance peg and this provides a possible advantage for separating the different layers when trying to integrate a VPAKE protocol in the transport layer. So the user account complexity of the augmented peg could better be kept away from the transport layer software components. So my proposal, for instance, when trying to integrate a PAKE protocol into TLS is to integrate the balanced sub-protocol Z-parch into TLS and provide the PRS string externally. This also doing so would also allow for more flexibility for instance by integrating two-factor authentication or smart-cast-based authentication which could be realized without modification within the TLS layer. And this would also allow for machine-to-machine interfaces based on low entropy secret. OPARC is specifically designed for evaluating implementation pitfalls and for ease of implementation, just avoiding errors that we have just seen heard in the previous talk so that you're present, you're quite not tempted to use non-constant time implementation or implementation aspects which might generate errors. Regarding the implementation, we succeeded in reducing a bit the computation cost for alligator by using the method from the A2 for 519 paper. This accounts roughly for about 5% of the speed for the balanced sub-protocol. That's very similar what Riyad has been talking yesterday on yesterday. And one important factor for the speedup was the improvement on the assembly level for the scalar multiplication, for the scalar multiplication. So for the field operations, we first tried Karatsuba multiplication that we have also had been using for the M0. But in the case of the 40xM4, we found out that accumulation was so fast by using the multiply and accumulation instruction that we fell back to the schoolbook multiplication strategy. We have ordered the sequence of the partial word products. You'll find them in the tables on the right for squaring and multiplication, such that we are able to keep as many operands in registers as possible. One important difference in the previously fastest implementation of Diego Aranda was that we merged long integer multiplication and reduction for the fields in one monolithic instruction so that we have less memory operands. And we avoided function call overhead by using inline assembly for addition, subsection operation and operation with a group curve constant. We have generated the assembly code by use of an automatic code generator which handled the register allocation because handling the register contents became so complex that we saw the risk that we make mistakes. So we generated this by some strip code. When comparing the results for the field operations, the most remarkable difference between the previously fasted result of Diego Aranda with the upper line and our line is the squaring algorithm where they had around 250 cycles on the Cortex-M4 while we ended up with something around in the range of 155 cycles. This speeds up specifically the inversion and the exponentiation operation for alligator. When using this improved field arithmetic, we succeeded in reducing the cycle counts for X25519 into the range of 610 kilocycles which is in our opinion an even competitive comparison with a much more complex code for curves which have endomorphisms, for instance 4Q. I would have liked to say it's the fastest known implementation but a recent update as a recent update I would like to point you to the impressive work of Emil Landgren which even succeeded in reducing the cycle count down to around 550 kilocycles. Our implementation of the full protocol is extremely memory and RAM efficient. So we end up with less than 600 bytes of RAM and less than nine kilobytes for realizing the full arc partial protocol including random number generation by SATA-20, the hash operations and also some operations which we need for firmware signature checkers. So let's me summarize. If you cannot avoid using passwords for remote access authentication, we recommend to use the combination of an EPAC protocol and memory hard password hashing. The result of our system level optimization strategy for the constrained servers are our proposals arc partial and C partial. We have shown a security proof for adaptive adversaries in the UC framework and we have shown that this protocol could be implemented very efficiently on small microcontrollers such as ARM Cortex M0 and Cortex M4 and that this implementation could be made even competitive with the fastest known approaches on these controllers which benefit from endomorphisms. We'd like to thank all the reviewers from chess and also from CFRG for their care with their manuscript and their constructive and helpful feedback. Thank you. Time for one quick question. There are no questions, let's thanks Bjorn again. The third talk of the session is...