 Royal Holloway University of London and for my sins, I'm a co-chair of the CFRG and I'll explain in a moment what CFRG is. I think most of you, if you've got this far in the week and you're actually here sitting here, you probably already know what CFRG is. I'll explain it anyway. And the sheet is circulating, so there are two copies. If you could sign one of them please and pass it along, that would be really great. Okay, so the agenda for today, I'm going to give a bit of an overview of CFRG and talk about what we're doing with various documents. Then we'll have three presentations, each one with discussion. The first one will be by Shy Giron and he's going to talk about ASGC MSIV, 30 minutes presentation plus 10 minutes of discussion. Then Joao Alwan is going to speak about memory hard functions, same format and then we'll have a shorter presentation from Andreas Hilsing about hash-based signatures about specifically the scheme that is currently making its way through our process and actually almost there. And then at the end we'll have 10 minutes for AOB and in that section perhaps we could discuss whether this has been useful for the academics in the room and if it's the kind of thing you would like to see more of and then we'll finish at 3.30 and we'll get thrown out at 3.30 because they need to start disassembling the room. Okay, so maybe I can go over this very quickly, you're all here, you all know that this thing is the internet, you know that IETF and IRTF are responsible for making sure it works and the CFRG Crypto Forum Research Group is one of the research groups of the IRTF that supplies crypto advice to IETF. Okay, and here's a little extract from our charter, I'll give you a few seconds to read that, it basically says this is what we're in motion to do, this is what we're trying to do. A general forum discussing and reviewing uses of cryptographic mechanisms, et cetera, et cetera. Okay, to offer guidance on the use of emerging mechanisms and new uses of existing mechanisms. Okay, so it's pretty broad. Good, so CFRG normally meets at IETF meetings and these meetings are held three times a year at a variety of locations. The last one was in Buenos Aires in Argentina, the next one will be in Berlin in Germany in July and there's a whole schedule going out to 2027 or something of these meetings three times a year. A lot of work is also carried out on the CFRG mailing list, that's an online, that's a mailing list that you can join and you can contribute to, it's completely open, you can sign up online. So formally what this is, it's an interim meeting, so we have this capability, plenty of seats at the front guys, we have this capability to have interim meetings if there's a specific topic of interest or if we want to broaden the set of people who are involved in CFRG. Okay, so this informally today is an experiment to try to increase academic engagement in CFRG and to figure out whether it's a model that's worth pursuing, maybe we could co-locate with key academic conferences each year, maybe with CCIS or maybe with crypto for example, probably not with TCC though. Okay, that was a joke by the way, all right. So for those of you who came late, the sign-in sheet is circulating, just put your name and your organization there please, I'm required to get a formal record of who's here. Fine, the way that we work is we develop documents, they start off as internet drafts, they would be adopted by CFRG which means they get looked after by CFRG and push through the process, you can see all of the documents that CFRG is currently managing at that link, these slides and all of the other presentations and slides should be available online already, unfortunately the system isn't quite working, I've uploaded them all but somehow you cannot actually download them currently so I apologize for that but most of these you can find online in other places so we have a selection of drafts here that are currently being sponsored by CFRG including ASGC MSIV which we formally adopted a few weeks ago including some signature schemes based on EDDSA which are actually nearly complete and will be very useful for the TLS working group, we're also starting up some work on password authentic key exchange so we're trying to figure out what a page should do first before we start standardizing any and then finally as we'll hear later from Andreas we've got some hash-based signature schemes so we're starting to get into the post-quantum world as well and think about security there and a major success recently was the publication of RFC 7748 so this is an output of this working of this research group which specifies elliptic curves including a curve 25519 and a 448 bit curve which will be adopted in TLS 1.3 and hopefully will become very widely used in future so this is an example of something that's gone through our process and is now available as an internet standard as an RFC okay also we're going to set up or we're in the process of setting up something called the CFRG review panel and this is an attempt to to formalize and to supplement the normal CFRG review processes this is the idea that's currently under discussion on the mailing list so feel free to to contribute there and give us your opinions on whether this is a good idea or a bad idea we expect the workload for people who are on the review panel so people will be appointed for a term maybe three years would be to review one or two documents a year write a little report summarizing the the qualities of the document making recommendations about how it should be taken forward okay and so the chairs that's me and Alexey Melikoff are looking for volunteers or nominations for volunteers to join the panel so if you're interested in getting involved in that part of our work please come talk to me I hope the workload will not be very high but I hope that the the output of those reviews will be very valuable to CFRG okay so that's about everything I wanted to say in my little introductory section are there any comments or questions about that let me just mention that for comments and questions you should really take the mic so physically come and take the mic and state your name and then make your make your comment that's how we'll handle it throughout the whole meeting that's how things are done in ITF IRTF so if you have any comments or questions about what I've said so far okay well then we'll move straight on to Shai Geron who's going to talk for half an hour about ASGC MSIV let me give you the mic all right so good afternoon to everyone I'm talking about ASGC MSIV that's how we call it and this is joint work with Adam Langley from Google and Yuda Lindel from Barilan University all right so I'll start with what it is in a nutshell and then we'll get into some of the gory details so what what it is is a full nonce misuse resistant authenticated encryption at an extremely low cost and I'll explain what the low cost means here we get the performance almost the same performance as ASGC MSIV and furthermore by design it can enjoy any optimization that ASGC MSIV can enjoy this companion GCM can also enjoy I say almost and I'll just detail exactly what is the only thing that it cannot enjoy and it has a full proof of security full implementation and we are going to post very soon updated the security margins actually showing that the security margins are even better than those for ASGC MSIV with some new twist that we have introduced some history so the first version is a paper by Yuda Lindel and myself we published in CCS 2015 and then we made an extended version with the joint forces with Adam and we just posted this as a as a draft for the CFRG and what we did was we added the 256 bit option to you know look for the future and some other things I'll explain so the features are well first you get the nonce misuse resistance I'll explain exactly what I mean by that it's easily deployable because first of all there is code available but you can actually use hardware and software primitives from ASGC MS some functions you can just reuse them if you wish there are no patents attached and it's publicly available there is a github where I posted the code in all versions that I could think of so a reference code just requires nothing optimize assembler and macOS assembler and also a C intrinsic version more or less with the same performance of course you cannot expect C to compete you know to the hundred percent with the hand tuned assembler but it is quite close and soon it would be integrated into boring SSL boring SSL is the google version forked from open SSL that is maintained separately all right so in a nutshell we're ready to go so first of all I'm I'd like to tell you that my that ASGC M has been my favorite for years and I've invested a lot of effort in trying to tune it and optimize it and this is just some some performance numbers in cycles per byte so lower is better of ASGC M so in the beginning it was not very competitive before you had the instructions for AES and for polynomial multiplication it had to be implemented with lookup tables and it wasn't very fast so it wasn't a a competitive proposal compared to for example rc4 and the shawan h mac so until so these were talking until around 2009 with the first generation where the AES instructions were introduced there was also an instruction to do a polynomial multiplication it is called the careless multiplication and together this jumped from 22 cycles per byte to three around three cycles per byte then in the next generation there was some more improvement these are all combinations of micro architectural improvements it's these are the same instructions it's just the way they are implemented by the processor and improve the software optimizations and as you can see in 2013 the Haswell generation it's around one cycle per byte the next generation was three quarters of a cycles per byte and the latest is 0.65 and the 0.65 cycles per byte for ASGC M encryption is very interesting because this is the first generation of processors where you can get the same performance for the authenticated encryption compared to just the counter mode encryption so this is actually the best you can hope for this kind of authenticated encryption so this is wonderful and then you'll ask so what's what's missing okay i'll just tell you what are the instructions these are the AES and i i think many people know they're already what they are the careless multiplier or the polynomial multiplication is a an accelerator for the G hash portion of the ASGC M and you saw what all of the codes that i then describe here with with this these performance are already in open SSL contributed to open SSL and to the NSS libraries and of course in the boring SSL so what's missing if everything is so wonderful the non-smuse resistance so here is the sketch of how ASGC M is working and what is important is that if to notice here if you repeat an IV and nonce then it's a disastrous it has disastrous implications in both the privacy and the integrity because the say if you're using the same key of course so because the IV is used the way it is used the first you derive a hash key and then you set up an initial counter which consists of the IV and then one you start with something and then you do AS in counter mode in a sense and at the end you apply a G hash a universal polynomial evaluation universal hash function over the cipher text and then you mask it you sort it with the AS of some mask and of course the counter blocks and the zero and the mask all have to be distinct but if you repeat a nonce then they are not distinct anymore and then you lose both the integrity and the privacy so what is our goal here why not enjoy both worlds let's make up some algorithm that would enjoy the hardware and the software support everything that they as GCM has to offer but with nonce misuse resistance and let me state exactly what I mean by nonce misuse resistance you input you input plain text and AAD and you expect cipher text and a tag and this is the property we want if you're using if you input the same nonce and the same message of course you're going to get the same cipher text that's an inherent property of deterministic algorithms otherwise if you are using a different nonce or a different message or both then you get the full security of the authenticated encryption unlike the GCM where if you even once repeated the nonce and you lost your lost privacy and authenticity so this is what we want to do so is it possible actually to to get this to to enjoy both to to have the cake to eat the cake and have it so the answer is yes and this is what I'm going to explain here so any questions until now because this is just the introduction and motivation um so how about this this is just because I'm going to use this terminology so polyval is is a universal family of hash functions that is defined as follows each it is actually a polynomial evaluation of a string of a of a set of strings of 128 bits and it looks very similar to the G hash of ASGCM but since we were to define something new it was time to actually correct something which is very annoying in the definition of ASGCM so it's not so polyval and the G hash of GCM are not exactly the same thing but they are built on the same construction so it is a polynomial evaluation over GF 2 to the 128 with some polynomial reduction now I'll show the relation between this and polyval and G hash so in G hash there is one word that that is very annoying and leads to some problems is that when you're multiplying in the field this is the multiplication in the field then you're expected to do computations in GF 2 to the 128 with some specific reduction polynomial but there is a small comment that the order of the bits in the bytes is also reversed and this seems like a benign comment but actually what it says is either that you need to take the cipher text and flip them before you input them into or and then or that the field that you are operating in is not GF 2 to the 128 with this reduction polynomial so there is some discrepancy in the definition and the discrepancy is not cannot be avoided by just say by just by just saying that the bits are reversed because the way that AS is defined is through bytes and the order of the bits within the bytes of of the AS state is defined in a way that that is opposite to that of AS GCM that's a fact so it's about time to define it differently so if this is the definition of AS GCM then polyval in GCM sieve is going to be the following this operation is actually not a multiplication in that field but actually I would call it maybe some kind of a Montgomery multiplication in the field but the field is a different field it's a different representation of the polynomial and if you notice it is just the same polynomial here but in flipped order and this organizes everything so there is a really straightforward relation with this left as an exercise so polyval of a string with the key x times h h is the hash key and x is the element in the field is like doing the g hash where you byte swap all the inputs and then you byte swap the results so if you have a routine that does g hash you can use it to do polyval and vice versa the nice thing is that in in the GCM sieve we don't need really to multiply by this by this key because we directly consume the polyval already consumes the hash key without shifting it to the left all right so this is just so now finally we come into the what is AS GCM sieve so you get some input message a ad you have two keys k and h h is for the is the hash key or authentication key and announce now first of all the message needs to be padded how it is padded exactly the same way as AS GCM you pad the ad and the message up to the next multiple of 128 bits and and also you pad to this one what I call the length length block this is this is an encoding of the length of the total length of the ad and the and the message this is exactly the same as in AS GCM and then first thing unlike okay in GCM you first encrypt and then you apply the hash function over the ciphertext and then you encrypt it now here it's the other way around so first of all we apply the polynomial evaluation hash or actually the universal hash function on the combination of the concatenation of ad message and the length block so this is a unique encoding now we derive what we call the record encryption key so we take the nonsense and encrypt it that's and this is going to be the key that is will be used for the for the encryption now the tag the authentication tag is actually the encryption with this record encryption key of the value we got from the universal hash function sort with the nonce we set the top bit to be zero you'll see immediately why we do this and so we just have 127 bits from this and another bit so we just have an AS block so this is the tag in the counter block this is what we are doing we set this the top bit to be zero so now you see that the tag is is a is the output of encrypting with whatever it is and the top zero bit and here is a top one bit so we have a separation and now we use the 95 bits of the tag and then here is something now we increment a counter now in ASGCM in ASGCM if I go back here there is also a right I didn't write this detail okay so here is something here is the twist we we do not zero in ASGCM you would take the the low 32 bits of the counter block and zero them and start the counter from zero actually from from one but here we don't clear them and we just add the index of the of the block and of course we just add modulo two to the 32 so we are happy to do up to two to the 32 minus one blocks in GCM it is the same it is up to the two to the 32 minus two blocks but this is a small thing but here is the here is the idea since we don't zero the the counter to begin with and we get the following benefit if the usage if your usage is that you're encrypting messages that are all shorter than two to the 32 blocks then the security margin increases because whatever you got from whatever entropy you got from the encryption here of the of the hash function is going to count to your favor so actually with this we are getting even better security bounds than ASGCM assuming that the usage does not encrypt messages that are as long as two to the 32 blocks and if this usage does encrypt messages or we don't have the any restriction on the message length below two to the 32 minus one blocks then it is the same as ASGCM and now the rest is easy you just encrypt with the record you encrypt with the record key the counter blocks or it with them this is counter mode and and that's it okay this is a small triviality about the length because if the message had to be padded you need to deliver the cipher text to be exactly the same bits as the input so that's it and I would just like to show okay so what what do we get what do we get for the 256 we wanted to upgrade this is a little bit more tricky the input is the same we can apply the polynomial evaluation hash and get the same thing but now how are we going to produce a record encryption key with 256 bits for the 128 bits case it's easy we just encrypt the nonce and we get 128 bits of output and we're happy this is the new key but how are we going to do this when we want to derive a 256 bits key so we had one we had we had some mechanism but eventually we just updated it because it was confusing and so here is what we are going to do so we need the record key of 256 bits so the top part would be the AS of the nonce and then to get the low part we are going just to encrypt this record key and then we have like this cascade is going to give us 256 bits for the encryption so here AS means AS 256 and the rest is the same thing we separate so the tag is generated by producing something by encrypting something where the top bit is set to be zero here it is set to be one so we're absolutely sure that this is they are separate and this is the same encryption and we are getting a 256 bit version of ASGCM sieve now let's let me just show one okay so okay there is a nice flow but let's let's go here and let's realize so what happens what happens if you are using the the same nonce twice so what what would happen and you will get the same record key so this means that you're going to uh so the the tag is going to depend of course on the message if you are using the same nonce and the same message of course you're going to get the same cipher text that's obvious but if you're using the same nonce and a difference message then the value t which comes from a polynomial evaluation hash which is a universal hash function is likely to be different what do you mean likely okay this this depends in the same way as ASGCM it depends on the length of the message which means how many roots a polynomial has in the field and and so forth and we have an expression for this and so okay you ask why do we invest the effort of of deriving a new record key for each record encryption and not use the same key as in ASGCM so here is basically roughly roughly the um the rationale we are using here actually 96 bits of an IV practically it's only 95 bits because because we set we forced the top bit to be one so with 95 bits of an IV which is randomized by this universal hash function we are going to get collision a disastrous collision after doing these two to the 48 times now if we want to adhere to the NIST bounds that apply to the ASGCM when you are using a random nonce this would limit the number of times you can use the same key two to the 32 so this is why we say you know what we will derive a new key for each a new record key for each encryption and this way we could meet the NIST bounds the NIST security bounds with much more usage of the key actually two to the 48 times to use the to use the key so if you take ASGCM and use a random nonce then you should not do this you should not use the key more than two to the 32 times and here you can just go up to whatever you want to two to the 48 actually even even more and this is why I'm saying that we get actually better security bounds than in ASGCM here is a nice picture of the same thing I don't know what was easy easier to follow but you see the the flow and I would like to show how about some what at 10 minutes I can retire by then okay okay so I thought you said stop 10 minutes okay so performance how about performance so this is in cycles provide smaller is better of course and I'm just giving you three generations of processors has well my nickname for this Broadwell and Skylake that's the the latest and here is for encryption and decryption so let's take the latest one one kilobyte two kilobytes the asymptotic value that you can see here is is less than one cycle provide point 0.95 0.94 and here surprise surprise for decryption you can get 0.64 or 0.65 cycles provide so for decryption ASGCM sieve and ASGCM classical get the same performance and you see this wonderful performance for encryption they don't and this is the one thing that this construction cannot do and GCM can because of the construction of GCM where you first encrypt and then you apply the hash function and you have to never repeat the nonce because of this it is possible to actually parallelize the encryption and the generation of the polynomial evaluation but if you want nonce misuse resistance you first have to go over the message and compute the universal hash and only then you can start encrypting in decryption it does it is not true anymore because you can just start decryption and at the same time roll in the polynomial evaluation so this is why the performance here is on par with ASGCM and here it is like exactly like ASGCM if you just if you don't parallelize the encryption and the polynomial evaluation hash it is funny that this trick of doing in GCM actually came from me to parallelize these things and I pushed this to open SSL and eventually it bites back but this I said this gap between nonce misuse resistant AAD and ASGCM is inherent to the construction you can you cannot bypass this you cannot parallelize the encryption and the generation of the polynomial evaluation so here I just wanted to just okay you would ask and what okay for 256 you have wonderful results I think in decryption you get less than one cycle of course it's like taking the the other results and multiplying them by approximately 1.4 right that's the difference between AS256 and AS1.8 I'm just giving a few numbers to show what happens in short input so one block two blocks and four blocks is just for how many cycles you can get this in 128 and 256 so this is I don't count anymore in cycles per bytes I just put the number of cycles this is still better than than what the ASGCM produces for short messages but that's a different issue security bounds okay there is a paper we are going to update to write a new paper with the improved security margins but as you can see basically this but this theorem is is from the CCS 2015 actually it is almost equivalent to ASGCM security bounds where the nonce is chosen randomly but we are going to even improve them with the new key derivation and that's it I'm done this is the repetition of the same foil and I think that we can stop here and if there are questions and comments okay so the way this works now is we originally had 10 minutes scheduled for questions but shy has finished earlier so we actually have more time if you want to make a comment or ask a question what you're supposed to do IETF meetings is kind of form a queue the first person in the queue takes the mic and has a conversation and then goes to the back of the queue if they want to follow up or if it's a really important follow-up they can ask people in the queue to kind of take precedence right so it's a little bit complicated it's not just a case of putting up hands in the audience but it's to try and make sure that everybody who wants to ask a question it's apparent how many people there are and things get served in the right order so if you want to ask a question the thing to do now is to come here and and grab the mic from me and ask your questions and I see at all coming I see Philip coming that's great and oh many people coming very good okay yeah so over to you take minutes of these questions okay my name is at all lurks from University of Louis I just trying to understand the statement that you're saying that the bound is better than with the other bounds are better so it's you're saying that's the bound is better when you're using random nonce ASGCM or let me explain what I mean by better security bounds so first of all if you take the ASGCM as it is ASGCM sieve as it is then and then the and the messages can be as long as two to the 32 blocks so multiply this by 16 to get it in bytes then the security margins are equivalent to those of ASGCM it's the same regardless of how you choose the nonce that's I'm talking about ASGCM with 96-bit nonce because if you take an arbitrary long nonce then then security bounds are are lower now but now in our twist what we do is we don't clear we don't clear the low 32 bits of the counter block and therefore if your usage is encrypting messages that are shorter than let's say two to the 20 blocks then the remaining 12 blocks 12 bits that that are here are going to give you 12 more bits of security so and this is not a property that you're getting from from ASGCM is this for integrity or for privacy okay it it goes for the integrity integrity bounds but actually I would say that counter mode should be also doing this there is no reason in counter mode to clear the 32 bits to to start the counter from zero and not from whatever you want to put there that's the statement okay thank you same question it's just there's a competition ongoing the scissor competition to to select new authenticated encryption algorithms so we're just wondering why take something else and not wait for the end of a scissor competition to standardize whatever comes out of it yeah first of all first of all the reason that we did not submit this to the scissor competition is very easy to explain we were late by a year or something just came up came out out later but I just wanted to emphasize and I think I emphasize this in the mailing we're not mailing lists we're not trying to compete with scissor maybe scissor would output something wonderful we're just saying ASGCM today is almost everywhere ubiquitous and you might as well without paying a lot of performance you can just actually switch the order in which you're doing the the polynomial the hash function and the encryption and you can get this so I don't see that it collides if somebody comes up with something better faster more secure yes why not so hi my name is Philippe I'm from epfl so one more thing that I I don't understand that much is because for tls with this new nonce derivation you don't really need a nonce misuse resistant scheme right because you you would so for tls 1.3 we will derive the nonce from the shared secret and and the record number in a deterministic way so you can actually ensure that if the the implementers don't screw up and that they are not using random nonces and so on you can never assure that implementers don't screw up but okay but but you're right and I think we also this question was raised in the mailing list and we answered if you are sure that the anons will never be repeated then you're fine with ASGCM I mean there is nothing wrong with ASGCM except for the whatever fear that you might by chance repeat anon so if you your application is absolutely sure that anons will never never repeat and ASGCM is wonderful that's how I start the ASGCM is the wonderful thing I've got a couple of questions for you Shaq and so one interesting feature of your scheme is that it's actually an instance of a mac then encrypt scheme right and then and then the issue arises of how do we make sure thinking about implementers not screwing up how do we make sure that there's no premature release of plain text now obviously in your implementation there isn't but in in in joe blows random implementation of the scheme which will appear on the internet in due course okay there's a risk here isn't there the you'll just decrypt and then output the plain text or ignore the result of the of the mac verification we've seen examples of that yes I've seen examples yes okay so what can I say it has to be implemented correctly and now the fact that we are doing first the it's not the tag the tag comes but the polynomial evaluation or the hash of the of the message first and then encrypted we have a proof of of security but now if you just decrypt something in your job don't check the tag then well you check the tag but then ignore the result you would be I mean you're an experienced implementer right but there there are plenty of people who you can make that mistake well the hope is that the crypto would be consumed from well-debugged libraries like open SSL or boring SSL and something bad what can I say you can you can screw up in many ways always would you consider adding some very clear statements about this to the security considerations yes I never of course I think that we will add something like a disclaimer or warning or something of course okay thank you I I'm Bertram from Roni Bochum I see that you do key expansion in your in each encryption operation yeah can you comment a please on the cost on modern CPUs of key expansion please because it used to be very expensive and second it looks to me like you could actually have the same effect with a tweakable block cipher so that you wouldn't have to expand the key but you can just compute your your tweak for that and X or it in so isn't this impossible like it seems very easy to do that it might I don't I cannot I cannot comment on this and analyze on the you know on the fly but I agree that there are many ways to make a non-smuse resistant a ad so it might be about the cost of key expansion it is not really expensive anymore I would say it is something around 15 50 cycles in the latest CPU and this is why we actually took the the derived so for a long message it is completely negligible and for a short message okay you paid 50 more cycles by the way I did many optimizations to actually try and and make the first encryption at the first time at the time where I just expand the key so to alleviate the cost but I was worried at the beginning exactly from this but because the because we added for the 256 version we added another key the river key expansion and it turned out to be a small cost negligible for long messages and short messages you saw the table and that's it but maybe there are other and better ways to do this I can say hi I'm just wondering about the efficiency so I spoke into some people on also Jean-Paul de Gabrielle and from her hallway okay so I've spoken to some people who are implementing a e-modes on smart cards and they say that when you have crypto processors that are dedicated to do a yes for example such a polynomial evaluation doesn't buy you much time and efficiency and that's an XOR is as much as a AES computation for example so now that the Intel architecture has dedicated AES computation as well like how does it compare in computing one block on a smart card no on a on the Intel processor but you have dedicated it's integrated in the architecture no what we have is an instruction that does polynomial multiplication of 64 or degree 63 and you just run need to write the good software flow to do the the multiplication in the field and the AES you just need to know how to interleave the operations so such that you can process many of them in parallel and this is how you get I mean in timings like if how much time the timing for computing one AES computation how does that compare to one XOR or one like the polynomial well well I said so so AES the throughput of AES it's no you don't do one the throughput of AES is 0.65 cycles per byte now and if you ask me how long for one multiplication for example how much would it take it's it's again you don't do one multiplication you do want to do several of them this is part of the part of the but if you computed as you go along you do one multiplication basically but but you know exactly actually I can actually give you give you the exact so you see what is the difference if you see here the difference between points 94 and 0.64 is actually roughly the difference between encrypting and doing the authentication on in parallel and just doing them serially so you see that it is approximately 0.3 cycle per byte to do the authentication part and you can actually measure if if you go and measure the public code that I posted and you just do AAD equals eight kilobytes and plain text equals zero then you will get this number so just by this experiment you'll know you could evaluate the cost the relative cost of the authentication versus the encryption but you have it from here just by so roughly there multiplication is half a year yes I wouldn't say I would say that the throughput of the the throughput of the encryption is roughly two times of the throughput of the authentication I wouldn't say it in the words that you said but yeah thank you Kenny Patterson again from raw Holloway I noticed that you seem to only be defining a version here for 96 bit nonces whereas GCM allows variable length nonces now they're not widely used in practice but they could be does that not mean that this is not really a drop-in replacement for GCM but only for GCM with 96 bit nonces so first of all first of all the inclusion of a of an arbitrary length nonce actually caused degradation in the security margins first of all in the spec there was a mistake in one of the lemmas and then it was the first time that the next spec was refuted and the recommendation of the Iwata and was actually using 96 bits nonce and the last thing I wanted to do is to step on the same on the same problem so we are we are using we are using 96 bits random IV it is randomized through the hash function so basically what we are doing is equivalent to to AES GCM with a random nonce of 96 bits one more question and can you show the pseudo code for encryption they show what the pseudo code for encryption so encryption yeah you had it on a slide sure thank you there this no no no the pseudo code okay okay this one so in GCM if I recall correctly the hash key is derived by encrypting the all zero block whereas here you have it as a separate key in place could you explain the rationale for that and also explain what difference that makes for the API from the programmer's perspective well yes okay so first of all we separate between the encryption key and the hash key whereas in GCM actually I read carefully the paper by McGrew and and Viega and they actually said that you could do it this way but they wanted to save so in a sense they take a key and they derive h which is the hash key from encrypting AES of zero and this is why the counter needs to start from one and actually this is why you can encrypt only two to the 32 minus two blocks so it's a different thing and we thought that if we start from something clean we might as well just a choice make a distinction between the encryption key and the authentication key and pay whatever it is in the API of course you just need 128 extra bits right because whatever you have for the encryption key and 128 bits for the authentication key from the performance viewpoint it doesn't really matter for long messages for short messages actually because of other properties of ASGCMC we are faster than ASGCM of course the cost of the API is there so okay so from the point of view of a programmer what does he or she have to pass into your function does he pass two keys or one key and then is it split inside a pointer to the AID a pointer to the message the length in bits of the AID the length in bits of the message well okay the authentication key and the encryption key and the nonce it's the same API as ASGCM just no it's not no it's not it's not the same API as ASGCM you have an additional input which is the hash key yes it is it is it is the same as the ASGCM plus another I think we misunderstand each other okay so now so now you're saying we really have to educate another generation of programmers that they should have another input to their encryption algorithm which is the the hash key or the mag key in this case whereas AAD as an abstraction only requires one key as input you're changing the API that the programmers will have to use here well how do you do this for char char and poly that's not my concern here you're talking about replacing ASGCM with something better I'm saying you're I'm saying you're potentially creating problems it's okay whatever that's I don't know how to address this now in theory in theory we could tweak it to have only one but okay consider it as if the key is 256 bits or 384 bits okay and then you split the key inside your algorithm yes you split the key I don't think your draft currently reflects that way of thinking and I think it might be helpful I didn't think about I didn't think about this but these are the kind of details that really matter when okay we want implementers to be easily able to use our constructions right that's a good comment we will add this comment okay thank you if there is you know if our hands are twisted we could make it with one key but it's ugly I think I think it is whatever the ASGCM did was I didn't like it you know the trying to partner at the University of Leuven follow-up question so on Kenny's question so what if then developers just make h equal to k that's what they will do right if you don't have a second key they will just copy the key again is that okay or not all right all right so first of all like a developer I can do it first of all it's a wonderful question to analyze from the security viewpoint and I think but I have to really write the proof I think that nothing is going to happen maybe it'll change something in the in the bounds but but I am I don't need this needs careful analysis what are we going to lose from that I don't know I have to think about this it's a it's interesting Bertram again from Gronje Bochum can you tell me about the key length so so is this mode designed against 128 bit security or 256 or can you mix and match because it seems like the h lengths will be constant again all right the length of the authentication key is 128 bits and the length of the encryption key can be either 128 for this version or 256 for for this version so because you don't for the authentication the the security proofs from the authentication are information theoretic in a sense and we didn't think that you want to actually make a 256 bit hash key for the 256 bit encryption but by the way that's the way it is also in in ASGCN you derive the hash key is always 128 bits this is true and this imposes the limitation on the maximum message that length yes yes because the because if you're using a polynomial evaluation hash then the number of routes of the polynomial is a limiting factor and if you the num so so this thing comes into the security bounds certainly and for this reason it would be nice to have larger h's which you do not support neither does GCN I know but by the way a longer h would mean that you need to operate in a different field and it would be more costly or you can do this certainly but and I don't think that today this this is a real limitation you know for how long is the maximum message okay any more questions we're slightly over time now but I think this has been a very useful discussion I hope the notes got taken you've got some things to think about for your draft so I should just stepping outside of this moment meeting for a moment and making a meta comment that's very typical of the kind of interaction you have at a CFRG meeting at IETF somebody proposes something and then people start to chip away at it and make suggestions and ask for changes and ask for the rationale to be explained so that's very typical of the kind of thing you would see if you came to a normal CFRG meeting just before we move on I'd like to actually personally thank Shai for coming today because he wasn't at Euro Crypt and he flew especially to Vienna from Israel to come and give this short presentation and to answer our questions so please join me in thanking him for making this special effort okay so we'll move on Joel do you want to okay so next up we have Joel Alwyn from ISD Austria and he's going to talk about memory hard functions and the reason that Joel's presenting this work is it relates very closely to the work we're doing on Argonne 2 the winner of the password hashing contest and so Joel's going to tell us about some of the research that's going on that's very closely related to that to that work so thank you very much Joel so yeah I'm going to tell you about memory hard functions and this talk really has two goals one is to inform maybe about directions that theory is taken the theory of memory hard functions and in particular to get some feedback on the direction that theory is going in and you know maybe we're going in the wrong direction or some things are good and what more could we ask so yeah so this is I'm looking for feedback on on on what we've been doing on the more theoretical side for memory hard functions and the other thing is that we have some results in particular about Argonne 2 and so I wanted to tell you guys about that yeah just so that you're aware kind of and you can also give feedback on that as to whether it matters if it does not matter why it does not matter for the standardization process if it does what are the consequences for the standardization yeah so some you know questions you can keep in mind some sort of sanity check somehow that you can keep in mind during the talk I'm going to tell you about the computational model that we've been using to prove things the complexity measure to make our statements you know we want to say something is memory hard what does that really mean and the type of statements that we've been proving and so you know you can tell tell me what you think are these things too weak too strong are we missing some crucial details are there things you like about it so that's what you you know some stuff to keep in mind all right so the memory hard functions in particular for password hashing started based on this observation that computation is very cheap for custom devices relative to general purpose CPUs so you have this disparity between login servers and honest users that are probably going to be using a general purpose CPU and then attackers who are going to try and brute force these password hashing algorithms to launch dictionary attacks for example and they potentially if they're willing to invest a bit of money are going to use custom hardware for which computation is very cheap so simply iterating hash functions for example as bcrypt does that somehow it doesn't help alleviate this disparity so this is observation you know what we'd like are functions which require as much memory as possible in order to compute them and that's because in custom hardware it turns out memory is relatively expensive and what we want is that this holds even when in parallel why because when you build a circuit you can put multiple cores you know the whole thing is it's a parallel device and you know when I say what does it mean that it's expensive to implement functions or what what do we mean when we say it's expensive to implement memory hard functions in memory what we're saying really is that we want the at complexity the area times time of a circuit that evaluates this function to be large at is commonly used as an approximation for the cost in terms of resources like dollars for a unit of rate for this circuit so okay so Percival Colin Percival really like kicked off this whole study of memory hard functions and he introduced this this idea of okay let's let's use memory as a way to bring up the at complexity for implementing these things in circuits and in particular he proposed the following definition and this is sort of the starting point of theory of the theory work and the definition which by the way argon 2 also tries to satisfy and and generally we would like a password hashing algorithm that tries to satisfy this in some sense at least intuitively and so the definition says the following you want a function that's parameterized has a security parameter n and you want that this function instantiated with n can be computed on a random axis machine in time essentially linear in n and this is because the honest guy is supposed to be able to evaluate these functions and the honest guy does not have parallelism so what's important is in that first line there's no parallelism it's just a random axis machine but as a security property what you want is that they cannot be computed even on a parallel random axis machine with some space s and time t so that the product of the space in the time is anything less than approximately n squared right so the idea is we want high at complexity and space is what we're going to use to approximate the area and the time is the time and this should be true also in a parallel random axis machine so this is the definition put forward by Percival that sort of got the ball rolling in this whole area so there's another distinction that I'd like to make which is between data dependent and independent memory hard functions so the original um proposal by Percival was a function is a function called script and one of the properties of script is that at least in the in the straightforward way of implementing a script your memory access pattern is going to be dependent on the input to the function right so in particular that this input it's it's passwords in this case and so it's something secret and so what you don't want is that this um your your memory access pattern somehow leaks through timing information cash timing attacks things like that and in order to avoid that people have proposed data independent memory hard functions so argon 2 in particular has a data dependent and a data independent mode and what's nice about data independent memory hard functions is that the access pattern the memory access pattern is independent of the input and so you somehow make it a lot easier for implementers to avoid these kind of timing attacks so that's why also in argon 2 we have these two modes all right so the rest of the talk basically has two parts the first part where I'm going to tell you a bit about where the directions that the theory is moving in and the second part so essentially about proving security for different functions including we're trying to prove security for argon 2 we have some statements here we have some statements about some other ones but I rather than tell you about the specific statements I kind of want to build it up I want to tell you about the model we're using and and what the flavor of these statements are and then I'll tell you also about attacking mhf so we have some algorithms for efficiently evaluating mhfs that at the very least in the nascentotic sense are definitely attacks that we get complexity at complexity well below n squared and I'll tell you a little bit about them and so we can try and you know together work out what if any practical implications there are of these attacks so I'll begin with the security statements and so for that you know first thing we need to do is talk about a computational model and um all right proving complexity lower bounds is very difficult in general on unconditional complexity lower bounds but it but what's very lucky for us is that most mhfs in practice practically all that I know of including argon 2 are essentially they essentially are modes of operations over a compression function so really there's not a lot of other computation going on other than that you're calling compression functions in certain ways with the data so this is nice for us because we can now model these compression functions as random oracles fixed input length random oracles and now we have a chance of analyzing things right but what we we still need parallelism so it's not just the vanilla random oracle model and the other thing is we have to talk about memory right we want to say it's memory hard so we have to make memory explicit so we use the following models uh called the parallel random oracle model and you have an algorithm and this algorithm is iteratively invoked and at each iteration a priori it's a stateless algorithm and at each iteration you give it a state which is essentially the state it output at the at the end of the last iteration it's an arbitrary bit string right and then it gets to perform some arbitrary computation then it makes a batch of queries to the random oracle okay so this is the parallelism part it's a batch of queries that it makes simultaneously then it receives the responses performs again arbitrary computation but by arbitrary I mean it doesn't get to query the random oracle anymore right so this somehow enforces if you want to do hash of hash of hash you really do need three iterations for that and we can now talk about the states that you keep in between these iterations so we've made at least some of the memory explicit okay so yeah and then at the end of this at an iteration the algorithm outputs a state again again an arbitrary bit string and that will become the new state for the next iteration so this is this is the model um you know the input is somehow the initial state the iterations keep going until the algorithm outputs a special final state and that's considered the output of the computation so this is how we model algorithms okay um okay so we're trying to prove security statements in this right so here's some maybe benefits at least from my point of view of this model it's quite permissive in a sense right um so we give this arbitrary non-random oracle dependent computation we give for free to the algorithm we're not charging for that right we're also measuring only some of the memory somehow during this arbitrary computation we don't really know what memory is happening we're only measuring the memory in between calls to the random oracles and um another nice thing is remember originally we wanted a pram a parallel random access machine well any pram algorithm is also a pram algorithm with no added overhead so if we prove security statements in this pram model we've also got a security statement for the pram model and giving these things for free to the algorithm is nice for security statements right because we say if we can show that no no algorithm exists that does something you know uses too little memory space time even in this model well then it's definitely going to be true if we start looking really at this arbitrary computation in the details of the memory so this permissiveness of the model it kind of helps us for security statements well and the other nice things we can actually prove things in this model that also helps so okay so now i've told you about the model i have to tell you a little bit about the complexity measures like how are we going to measure complexity here um and well remember our goal is somehow area times time so the product of space and time so that didn't come out very well so sorry um basically what we're going to look at if you you fix an execution in this model and now you say what's the complexity of this execution well kind of the natural thing to do is you look what's the largest state that was produced during this execution and how long did the execution run and the complexity is the product of the two so what's the intuition well how much area are you going to need or how many you know how much space will you need on a circuit in order to run this execution well you need to store at least the biggest state you need at least that many registers because you have to store those simultaneously and how much time you're going to need well you know the number of iterations the number of consecutive calls to the random miracle so this is the complexity measure that some that you know it's it's essentially space time all right so in this model this is how we're going to define complexity and our statements our security statements are about this with one caveat so okay one sanity check here that you can ask yourself um if i give you an execution and i and i show you this execution has high st complexity in this sense does it follow this is what we would hope does it follow that having a circuit that runs this execution has to have large or you know uh yeah large area times time product in a circuit okay this is this is a sanity check i mean this is not a formal thing this is sort of the transition from the formal model to the actual thing we want so yeah you guys feel free to then tell me about why that's not true it is true whatever okay so um now we have a you know we have the complexity of an execution so now obviously what we want is actually the complexity of a function for example argon 2 how do we go from the complexity of an execution well in a pretty natural sense so uh you know you first have the complexity of a particular algorithm on an input and the intuition is on input x algorithm a almost always runs with this st complexity all right and that's sort of the probability of the x st complexity of the algorithm blah blah blah but that's the intuition that it almost always has at least this st complexity and so the st st complexity for a function is you minimize over all algorithms and all inputs you say what is the st complexity of the best algorithm for evaluating this function and it's on its favorite input so it's rather worst case which is good because we're talking about an adversary who's brute forcing passwords and we don't know much we don't really know anything about these inputs and the adversary will essentially let him choose his favorite password guesses that's why he was so mineralized over inputs here okay but there's there's one issue that comes up which was made which you know which is kind of important here which is that parallelism doesn't play well with area times time complexity right area times time complexity doesn't scale well in terms of the number of copies of a function that you're computing and really it's not that difficult to see pictorially imagine you have an algorithm that computes a certain function and here you have the time and the space that it takes right and at the beginning of computing this function you need a lot of space but after that it's followed by a long tail which takes a lot of time but not a lot of memory right so this is going to give you high at complexity high sc complexity but now if you want to iterative you want to compute a lot of these copies of this function as soon as you free up the memory on the first instance of the function you now have all this free memory lying around on your circuit might as well use it for the next one and because you can do things in parallel you can actually really not increase the time so you're not using a lot more memory and you're not using a lot more time and this is actually this this actually really happens in fact the attacks that i'm going to show you later very much leverage this they don't work on a single instance of argon 2 they work on many instances of argon 2 because even for a single instance there is a period where you do need a lot of memory but for a lot of the time you actually don't in this in this particular algorithm is a particular attack okay so so this is a real thing so we do need to modify our notion of st complexity if we want to be making meaningful security statements and we do it in the natural sense sort of kind of almost the brute force definition we're going to what what i would call amortized st complexity okay we're going to minimize not only we're going to essentially allow the adversary we're minimizing which sink what's the complexity of the best adversary on his favorite set of inputs and we're going to divide by the number of inputs that he chose to evaluate all right so this is somehow we're we're sort of brute force defining the fact that this is amortized and that's what we're going to work with this notion of amortized at complexity and it really is different for functions that we really care about so our security statements are about this now so the sanity check now that you one thing you could ask yourself is if i give you a function and i guarantee you that the amortized st complexity is large for this function is it true do you believe now that implementing a brute force attack on this function in a circuit is also expensive not just a single copy and i i my understanding is this is actually what we want from our password hashing algorithm okay so just to give you an idea of the flavor of these type of security statements what how do they really look what kind of things have we proven so far well for example for argon2i and the balloon hashing family of algorithms we we made the following statement so these algorithms i'm not sure how familiar you you are with them but they come with two parameters actually a memory cost and a time cost the intuition is that you can scale the memory independently of how much time it also takes and the statement we have that is that if you fix a time cost one which essentially means tau is the number of times you iterate over memory when you evaluate argon2 right sigma is the size of the table that the honest guy builds up he builds up the table of size sigma and then he iterates tau times over it and we show that well for any sigma if you're going to iterate over memory one time so this is kind of like when you'd hope for yeah okay what you get is that well the complexity that again didn't come out well is at least sigma to the 1.6 repeating okay so we're not at the n squared that we want but we're at least at sigma to the 1.6 and by complexity again i mean the amortized st of argon okay in this parallel random oracle model and this this is argon2 if you recall it actually the structure of the calls the modes the calls to the compression function depends on your salt so this is not an absolute statement this is over the choice of your salt but with very high probability so essentially for practically all salts you know that you have at least amortized st complexity of this for your argon2 instance argon2i sorry it's only the data independent part um okay then we also you know in an upcoming work we have a different kind of construction maybe more theoretical where we show that you can you can actually get better than this so this is we have a particular construction where we show that you get um in fact amortized st complexity of n squared over log n so you can actually get really close to n squared and we also show that remember the honest guy needs to be able to compute this without parallelism and we show that we give an algorithm for computing this function that essentially matches the lower bound for the parallel guy so we we make sure there's no gap between what the honest guy can do on his sequential machine and the parallel guy can do with all his extra power um okay so this is that was the bit about security statements now i'll tell you a little bit about when we when we talk about attacking these functions what what do we mean here so the first thing you know really kind of to keep in mind here maybe throughout the entire thing is when is an attack really a practical attack we should care about versus when is it more of a theory thing all right so i'm going to try and tell you some details about what we what we when i say we have an attack what does that mean what are we saying so that you can keep in mind does this really mean something to me in practice okay um so intuitively you know an attack means you have an evaluation algorithm that beats the honest evaluation algorithm which works on a sequential machine and because it's an attack it's for the adversary we allow parallelism in this algorithm all right um so yeah you can think you know the when i talk about the quality of an attack what i'm saying is i'm taking the complexity of the attack and dividing by the complexity of the honest algorithm all right so yeah that actually might be the other way around the complexity the honest guy of the attack sorry so when the quality goes up yeah it means you have a better and better attack okay so i need to we need to maybe modify our notion of complexity because before this complexity was rather permissive and when we talk about attacks and we want to know is this a real attack we need to be a little more fine grained about this complexity all right um so we've been talking we've been looking at two notions of complexity which try and capture the following intuition we once again the the cost of building a circuit and to talk about the cost of running a circuit all right building a circuit is because you know we want to make sure that building these chips is as expensive as possible but if you actually look at how these types of brute force devices are marketed in practice for example bitcoin mining rigs which essentially do the same thing as password hashing um they're very often quoted in terms of the electricity consumption right because running the device is you know that's that's something you're paying all along whereas the cost of building the thing you can amortize over its lifespan so you do also want to say something about how much it costs to run the thing and so we tried we tried to do that right we we tried to develop a complexity notion that sort of gives us an approximation of the consumption electricity consumption of this device all right so first of all for when we talk about the area now we want to be a bit more fine grained we don't want to ignore the fact that we're using parallelism so potentially we have a lot of these compression functions on the circuit we actually want to count that we want to count how many copies of this compression functions you need as well as how many how much memory do you need and the sum of those two things that's the space that's the area of the circuit that we're going to use so going back to this parallel random oracle model what i mean here is not only is the complexity of a given execution going to be the maximum state we're also going to add the maximum size of the batches of a batch of calls to the random oracle if at some point in the execution the algorithm makes 10 calls in parallel to the random oracle you're going to need 10 copies of your compression function on the circuit so that should get added to the area otherwise we could come up with a with an algorithm that evaluates doesn't use a lot of memory and oh it looks like an attack but in reality it needs crazy amounts of core of these of the of the compression function on the circuit so it's not a practical attack um oh okay but in this equation that i write here we're kind of charging the same for storing a bit like one bit register as we are for an implementation of char or blake 512 in the case of argon 2 and that's obviously not really fair they take a different size of memory a different order of magnitude maybe even so we what's been happening in theory is that we've been using this uh an extra parameter that essentially tells you the ratio of these two areas and the complexity measure is parameterized by this ratio and the idea is that you want to make this complexity statement somewhat independent of the technology um you know for different uh technology for implementing these chips this ratio will be somewhat different so it's a parameter and you can plug in the parameter for a given technology and you'll find the complexity um okay so there it is we multiply by this parameter r it's the core memory area ratio okay so as far as energy complexity goes this is now trying to approximate somewhat the cost of running a circuit okay so we're only going to now charge for memory that's actually actively being used just because you have 10 gigs of memory on the circuit if you're really only using during most of it one gig we don't want to be charging you the whole time for these 10 gigs so the idea is instead of looking at st is somehow the area of the entire box that contains your memory curve we are only going to look at the area under the curve itself that's kind of the intuition of what we look at with energy complexity so instead of looking at the max size of uh a state and the maximum batch size we simply okay i'll get there in a second sorry we have to talk about what's the actual unit of measure here um and okay so we we built when we talk about energy complexity this is this is the intuitive unit that you can have in mind so okay i define a talk is a certain amount of time that it takes to evaluate the compression function once from start to end without pipelining okay and then um the unit measures let's call it milli watt talk all right it's the amount of electricity required to store one bit for the duration of one talk and so the core memory area ratio is essentially the amount of milli watt talks it requires you need to evaluate the compression function once and so now we can define the complexity in the parallel random oracle model instead of taking the max of the memory of the states we simply sum the size of the states up and similarly for the batches of queries so if the very at the beginning you make some you have some huge state and you make a huge batch we charge you for that once but if in the next iteration you're just next many iterations you're just storing small states the next summons will be much smaller okay so that's it for the complexity now i can tell you give you some examples of the type of statements that we've been making so for argon argon 2 i in particular we have an algorithm that evaluates argon 2 both with amortized energy complexity and amortized sorry with energy complexity and amortized at complexity with complexity at most end to the 1.75 times log n well that statement there so maybe one thing to notice is asymptotically this doesn't look good for our goal of trying to get as close to n squared as possible so asymptotically i think it might it would be fair to call this an attack but of course the question is what does it mean in practice so these equations just really didn't come out so that's all right you don't have to pass them anyway that's all right it's just it's fine so okay so in an effort to try and understand what this means for us practically speaking for this password hashing for the password hashing standardization we actually compute in a more exact sense the complexity of this attack okay so this is no longer an asymptotic statement and you don't have to pass it now because i'm going to give you a bit of like sort of analysis like what what kind of things one could interpret into this um so the yeah what does this mean well okay let's let's try and argue that this doesn't mean anything for us in practice here are some things one might one might say to that end well for the exact security statement if you plug in okay that's point two if you plug in parameters which we might want to use in practice right so memory table of size one gigabyte that's the n equals two to the 24 and a ratio r which is derived from actual specifications 2013 uh technology that's the core memory area ratio and um if you're willing to do more than one pass over memory well then the attack doesn't give you anything better than the honest guy this is one this is a fair argument right some other arguments you might make is maybe we need unrealistic amounts of parallelism to run this attack remember it's only an amortized attack so you really have to compute many instances in parallel and it might just blow up the size of a circuit to something unrealistic and third you know this at complexity right we're only we're only looking at part charging for part of the computation there's still this arbitrary computation in between random oracle calls and maybe our attack abuses that maybe maybe it does crazy stuff there so i'm going to address these three points um yeah so okay maybe this this is kind of uh this is a good point so let's start with that um so these parameters are chosen because they're practically interesting parameters that we would want to use all right so okay the first thing to note here is what are we saying by saying you can do more passes over memory and then the and then the attack doesn't have anything better do anything better than the honest guy well what we're saying really is okay i'm going to fix the amount of memory that this computation needs but i'm going to double the computation that the honest guy does to achieve this memory hardness right so in a way we're kind of getting a little bit less ideally what we want what we want is the maximum memory hardness the maximum memory requirement for a given amount of computation and so what we're saying here is we're saying well for this amount of computation which is one pass over memory we can't get the memory hardness we want so we're going to add computation without adding memory all right so this is this is what what this you know doing this to take the attack out of the picture this is what we're really doing here we're adding computation without adding memory requirement um okay another thing to be said here is that no attempt is made so far to optimize for specific parameters that matter in practice instead in the analysis of the attack what's been done is that um asymptotic asymptotically optimal values have been used now if we actually go and look at this analysis again for these concrete parameters um we have a chance of actually improving the attack in terms of exact security and indeed just playing around a little bit with some code to do that to sort of brute force search for optimal values instead of just using up asymptotic optimal we end up seeing well already the analysis can be improved to needing six iterations of a memory if we're going to use one gigabyte of memory and you know there's potential for this to improve we we haven't really we haven't tried yet so to say okay uh another thing is right maybe we need unrealistic amounts of parallelism for this yeah okay so let's let's try and do a bit of calculation here how would this circuit actually look if you try to implement full concrete parameters that we actually want to use in practice okay so argon 2i the random oracle that we're going to use it's a blake 512 and blake 512 in area efficient implementations in an ASIC can be implemented in somewhere around 0.1 square millimeters right say 0.2 but it's about 0.1 is the quote from the makers of argon 2i okay so here's how you would implement this attack one way to implement this attack you have a central core a central ASIC which is rather big we'll call it the big ASIC and it's surrounded by smaller ones okay and for this if you're trying to attack this one gigabyte in memory set of parameters you want 256 of these smaller chips and this one central chip right and this central chip is going to need about 4 000 2 to the 2 2 to the 12 uh uh blake 512 cores so this is about 410 square millimeters right this is not too crazy modern CPUs are bigger than this we build chips bigger than this already now the total memory you're going to need across all ASICs and the entire thing is on the order of about 50 gigabytes now i'm i'm not professing to know enough about low-level implementation to really you know sort of judge this too well but my feeling is these are not particularly unrealistic requirements especially for an attacker who's well motivated who's willing to invest in password cracking this is something that we might be able to build with real technology is what i'm saying okay so you know just wrapping it up basically some questions here is you know this attack is neither apocalyptic nor is it purely theoretical in my opinion all right this is now my opinion so to say um it could be improved yeah i believe it actually could be improved both in an asymptotic sense and in an exact sense in fact asymptotically i think we already we have something that improves on it um the analysis could be tightened especially for particular parameters we could try and optimize full constants um on the other hand what else could we really use for a password standard for this password hashing standard well maybe the balloon hashing right this is something that came after argon 2i and in fact our attack does in terms asymptotically does no differently just as well but in terms of concrete parameters does not do as well so that might be an option for us to consider um or altogether something new this theory is pointing towards ways of construction constructing imf imhs for which the attack really doesn't work where we really get almost the n squared asymptotic and maybe we can we can push that into something more concrete okay as far as the more general question sort of the direction of the theory uh well you know that's really what i want to hear from you uh does this parallel random oracle model do security statements mean anything to you guys do you how can we do things better yeah i think that's it so we have a little bit of time for questions or comments from the floor um let me ask one so kenny paterson um thank you very much for the presentation i think it was very informative and i appreciate the the work that you're doing to really try and define good models here um do you think cfrg should pause in its standardization of argon i do okay i mean i hesitate to say it because i don't want to throw a wrench for no reason into the works but i honestly do i think the theory is uh there's very promising results coming out in theory that point towards it being definitely possible to come up with at least algorithms in the near quite near future which have a lot more theoretical you know underpinnings here that is what i believe so if i can ask a follow-up question do you think the password hashing competition should be rerun once that theory is better established i think that would be quite beneficial yes because i think there's a lot more understanding now some of the things that are coming out of the theory is that there's been a basic i don't know i would say it let's say from a theoretical point of view a basic design error that has been repeated again and again and again in many in practically all of the imhs that were submitted and this has now become clear this attack highlights this and so this is one lesson that we could you know could learn from this and it would be nice to sort of see if we can you know push this lesson into something more practical and i think theory people are not necessarily the best equipped to come up with a practical thing so it would be nice if we had a competition with practitioners involved to try and bring this theory into something practical but very nice talk thank you bar panel university of leuven so i want to get some clarification because if you read there is some work we may not be aware of of mike winner from the mid 90s on full cost of computations it's also asymptotic analysis and one of his general comments is that memory access times are very important as well a few compute costs but i never saw this like how many times you write your memory or the pattern in your memory because it seems that if you're going to build concrete designs this may completely dominate both your energy consumption and your hardware cost so i know in the context of memory bound functions which we're talking about the complexity measure where was the number of cash misses i think this might be along those lines so my understanding is okay i have to admit that coming from the theory side i do not know enough about the concrete memory architecture in asics my understanding is this notion of cash misses is more concerned with layered memory architecture so missing l2 missing l1 going into slow memory i'm not sure how relevant that is for asics maybe it is i don't know that's that's all i'm saying the other thing is that a lot of these functions have so in the dynamic mhf case something like script argon 2d i think we will actually get a lot of cash misses as well because we're at every single step at every call to the random miracle to the compression function we're calling what is up till that point being we need as input of what has been up to that point a random position in our memory table that is very big and presumably will not fit into fast memory so i think although we haven't analyzed that explicitly for script argon 2d i think they will actually also have this a lot of cash misses for the static case it's much harder to analyze i think because the entire structure of the motive operation over the compression function is a priori known to the attacker i don't know of specific analysis about cash misses but i mean just off the cuff i think if you have random accesses i don't think cash will help you at all it will waste your time because you're just updating it all the time but i guess there is still the point of if you have to access memory and you have a large quantity of data you need actually to invest a lot in the access network that's the point he wants to make the other thing question is um i was not clear it was clear to me that you you have a number of you're optimized like for the number of blade courses is a function of your optimization that you say i need so many or you just took as many as you could fit on a chip or is there an optimization no in this particular case this was actually targeted at argon 2i but not at the specific choice of parameters for argon 2i um actually that's not true no it was targeted at those particular parameters so the layout that i showed there was specifically if you're attacking one gigabyte the the parameters for one gigabyte of memory however i don't think that i think what one could do is design a chip that would be for reasonable parameters it would work for all of the parameters and it would still stay a reasonable design of a chip it wouldn't blow up too much in complexity but that design that i mentioned there with exactly 256 cores is for it's optimal for a certain choice of parameters okay thank you any more comments or questions before we move on okay uh let's thank joel again thank you very much okay so moving along to our third and final presentation uh andres hustling is going to present uh briefly on xmss thanks andres okay so this is uh work with denis boutin stefan gastag and um asis moheisen and um we wrote an internet draft for hashbase signature schemes so the reasons that they are post quantum um what's nice is they only need to secure hash functions so any signature scheme needs a secure hash function to compress the message in the beginning we just get rid of this additional interactability assumption the security of hash functions is quite well understood at least for um if we consider generic attacks um then we even got complexity lower bounds for quantum attacks and the stuff is relatively fast so we've got implementations that can outperform rsa for example on the same platform so a brief overview in a merkle like hashbase signature scheme you start with the one time signature scheme for example lampert scheme that i guess some of you will know you generate many one time key pairs build a binary hash tree on top and your new public key becomes the root of the of this tree and the leaves are the one time public keys now the one time uh the the secret key of the scheme consists of all the one time secret keys but these are just random bit string you can generate them pseudo randomly and then just store a short seed and if we want to sign a message we're using these key pairs one by one that's why these schemes are stateful because we have to prevent using the same one time key pair twice and so for example if we sign the second message then we use the second one time key pair generate a one time signature and put the one time public key and the one time signature into the tree signature and then we also add these nodes which a verifier later can use given the one time public key to compute up to alternative root value and then if this computed value matches the value in the public key then the one time public key is authentic and then if also the one time signature verifies then the signature is correct okay so xmss is an improvement of the scheme from 2011 in the tree the construction slightly changed to use some bit masks which are x or before the hash function is called this allows to reduce the security requirement um from collision resistance to second premature systems it uses a certain one time signature scheme which is some variation of the winter that's one time signature scheme um which actually has the same goal so to get collision resilience and for the message digest we use some randomized hashing and the goal of all of this is to achieve collision resilience which is not only nice from a theoretical point of view but also allows us to um to halve the signature size because the signature consists simply of a bunch of hash values and so if we don't require collision resistance we can use hash functions with half the output size then we've got a multi tree version which is kind of a ca structure so we've got simply many trees and the tree on the top layer is used to sign the root nodes of the trees on the next layer and so on until on the bottom layer the leaves are used to sign the messages and the reason for this is um if you want to do a big number of signatures with one key pair then you need a tree of quite some height so you can do two to the h signatures um with one key pair if h is the height and um key generation is linear in this number and if we do this called tree chaining then um we can get key generation if we use d layers at the coast of something in the order of um d times to to the h over d and it also allows with some algorithmic improvements um to reduce the worst case signing times okay so the scheme which we actually get in the standard now is not the scheme which we published at pico crypto 2011 um we added something called multi target attack resistance which means we are keying the hash functions inside the construction and um we also use different bit mass for all the hash function calls um this allows us to give a tight security reduction which in turn allows us to um select um parameters such that we get smaller signatures at the same security level and um so if you're looking for the scientific reference you actually have to look at our pkc paper from this year and not the 2011 paper um as i said the scheme we're currently standardizing a stateful but um it's the main building block of sphinx so like um for the stateless signatures or the most um performance stateless signatures that we got at the moment from hash functions we can build on this draft later on so i will repeat some recent changes that we did before the Buenos Aires meeting so that's what already made it into the current version of the draft so we changed the randomized message um hashing a bit to also get a multi target attack resistance so the motivation is um if you do normal randomized hashing so you apply age to some randomness and concatenate it to the message then after seeing q signatures an attacker just has to find a pair r and m such that matches one of these um hash values of r i and m i and the reason is that um the r i is not authenticated in any way so the attacker is free to choose r i and so the security level would actually be for an n bit hash function n minus log q and what we did is we just added the index for the main separation so the index of the use one time keep here so the hash function call is now age of r i concatenated i and m i and practice this prevents multi target attacks we do not really have a formal proof especially there's no standard model proof if you just take this as a property and analyze it in the random organ model it's straightforward to show that this actually gives you some domain separation and thereby um the attack complexity is two to the n again okay i would jump over this addressing scheme because that requires too much knowledge of the draft we've got some recent changes that we are like these are the final changes we want to make so the first thing is currently the draft um you can think of these hash based signatures as kind of a protocol that uses some hash function and so to instantiate it you have to select the hash function we currently got two parameter sets or instantiations which are chart two 256 and chart chart 20 for everything pseudo random and one that just uses chart two 512 to build all the functions um we were asked to add chart three um we were thinking about making chart two 512 optional because 256 bits of post quantum security seemed to be a bit far fetched and um we also got the command um that we should do a pure chart two 256 modes because nist already standardized this and this makes it easier for companies um to use this because of compliance reasons and also the code size goes down and so what we will do is um have one single chart two 256 um parameter set which is mandatory and then as optional provide a chart two 512 parameter set and also two parameter sets based on chart three one of the reason that we through our chuchas also that we had to change this addressing scheme which is so it's used to pseudo randomly generate all these hash function keys and bitmasks used in the scheme and um we had to change it a bit such that it also allows to be reused um first things later and um then suddenly chucha does not give us enough space to actually um process these addresses okay the last thing um which might be again a bit more interesting for cryptographers um we need this randomness for the pseudo random hashing and the standard way to generate this in a deterministic manner is you've got some dedicated secret key value which is just used for this and then you apply a pure f with the secret key um keyed with the secret key value to the message to derive this r the drawback is if you think of small devices for example that cannot hold the whole message you have to stream in the message twice because you need it to generate r and later to actually compute the hash and so what we can do with the stateful schemes is we replace the message just by the index of the used one-time key pair this um allows us to just process the message once but it differs a bit from other schemes so that might be a drawback and actually one thing we might will change also we will add the the public key to the hash so we have time for questions and comments for andress somebody's coming hi philip mpfl um have you considered uh switching so you said that the chacha permutation was too small right as far as you understood if you considered switching to something bigger like like for example the one from black two um no in this case we didn't because we already decided that we will probably get rid of this just because of the code size it's easier to implement if you only have to implement one function instead of two so for example our reference implementation we currently use open ssl for um for chateau and then we've got in addition the chacha permutation in there because we also do not use it in a standard way any more um uh so we have seen here that there is still a lot of ongoing research on hash-based signatures so there's the first publication 2011 and then you added a lot a lot a lot of modifications to that to have smaller signatures and and all the stuff uh from your perspective when is the right point to standardize such a scheme on cfg why for so this this last change this xmc t was actually motivated by the standardization process from a theory point of view the improvement is negligible but you've got those compliance issues that you really need security levels 128 256 and if you're slightly losing so if you're slightly below because your reduction is not entirely tight and um there are actually attacks then this simply looks ugly and people will complain and people did complain uh sure my question is more uh quantum so i think the main selling point is quantum computer resistance uh quantum computers have not been built they will be built maybe earlier in five years so why not standardize in five years but why do you think now is the right point in time to do that because if we standardize now so that took now i think until this is finished it will have taken something like two years i would guess there will be one more year until everything is done then until the stuff gets deployed i mean companies now start experimenting with hash-based signatures so i know of some big companies that actually started with this um until this is out in the field it will be something like five to ten years so just to have an option available and for this one we we don't have these issues like for for letter space crypto that the security estimates are still kind of moving um so that's the reason why i would say do it now then we've got this option we definitely would need some um encryption alternative i think there it's much more pressing for signatures you just need the scheme when the quantum computer is built because then you could still also in archives um resign um for encryption you don't have this so actually i would hope that we soon got some post-quantum candidate encryption scheme which is which we're confident enough to say like okay for the moment that's something we could use we all done okay let's thank andreus again thank you very much andreus again i should thank andreus publicly for coming and giving the talk i think he was here at the conference anyway but i did twist his arm into coming along and giving us yet another update on on on the hash-based signatures here so thanks very much andreus for doing that okay so now we're almost done um i just wanted to give uh people the opportunity to ask questions so this is like the any other business uh section of the meeting so if anybody wants to raise any issues or bring anything to my attention or ask any questions about cfrg i guess now is the time to do that well you're all thinking of your questions um maybe i'll ask you one as an audience uh so has this been useful as an introduction to cfrg and what we do and would you like to see something like this at a future cryptography conference eurocrypt maybe even crypto this year yeah maybe a quick show of hands for people who found this was useful and they would like to see more of this kind of thing okay does anybody want to see less of this kind of thing people think this is a waste of time maybe the people who already left uh a few people went i guess they had flights to catch that's an optimistic view okay so it's fairly positive feedback there i guess not very scientific but um useful to get a show of hands okay so any other business any other questions or comments okay so um we've had some note takers busy in the middle of the room and uh we'll be posting some notes and uh eventually on the cfrg website um the next cfrg meeting will be in berlin uh in the third week of july maybe some of you will be tempted to come along uh so let me just close by saying thank you for coming today thanks for making it uh interactive and useful and have a safe journey thank you very much