 Thank you. Okay, so passwords are one of the most prevalent forms of authentication which we use today. So in a traditional password based authentication scheme, the user first chooses a password and registers it with the scheme. So that password is hashed with a slow salted hash function, and the resulting digest is stored. So then in the future, when the user wishes to authenticate himself, he re-enters his password. The password is hashed with the same slow salted hash, and authentication is granted if the resulting hash matches that which is stored. So you can see that if the password has entered with any kind of small typo or error, then authentication will fail. So throughout this work, we model the slow salted hash function as a random oracle with some associated cost C. And this cost is important because we will measure the runtime of all of our algorithms and attacks in terms of the number of slow hash queries which are made, since this will very much be the dominant cost. So in this scheme is written, authentication requires one slow hash computation. So it follows that if the hashing cost is equal to C, then the runtime of authentication will be equal to C also. So when it comes to attacks against these schemes, there are two main vectors that we're interested in, online and offline attacks. So an online attack is when an attacker attempts to impersonate the user via the login API of the system. And while this is a very real threat, there are techniques we can use such as locking the account after a certain number of incorrect password submissions that ensures that the number of guesses an attacker can make in this kind of attack can be kept very small. In contrast, in an offline attack, the attacker has managed to compromise the password hash database and tries to crack the passwords by brute force. And in this setting, he can make a huge number of guesses with the only kind of limit on this really being the amount of time he's willing to invest in this process. So in this work, we focus on the offline case. And there have been many examples of high profile password breaches of this form that illustrate that this is a very real threat in practice. So we formalize offline attacks in the following game. So we sample a password according to the password distribution and this models a user choosing their password. So that password is then registered with the scheme as this results in the production of the hash digest which we label here with an H. And depending on the scheme, it may also store some additional information which we show here with an S. So the attacker learns the hash digest and any other stored information and he tries to recover the password by brute force by submitting guesses to the random oracle and he wins if the correct password is amongst the guesses that he submits. So the number of queries he can make is inherently dictated by the hashing cost. So if the attack is willing to invest some time T in this process and the hashing cost is equal to C, it follows that he can make Q queries within his attack runtime where Q is equal to T over C. So we assume conservatively that the attacker has complete knowledge of the underlying password distribution and it's not hard to see that his best bet is simply going to be to query the Q most probable passwords according to the distribution. So his attack success probability is equal to the weight of these Q heaviest passwords. So just to kind of recap, we've considered two like key metrics so far so we have the authentication runtime and the level of security achieved by the scheme. And there's an inherent trade-off between the two which is represented by the hashing cost in the sense that we want the hashing cost C to be large enough that these offline attacks are hard but on the same time we can't take it too large and it will make the authentication runtime unreasonably slow for a legitimate user so it's a constant kind of balancing act between the two. So the further metric that we haven't really spoken about so far which is usability. Because in the scheme I've just described the user must enter their password exactly for authentication to succeed. But we know that users make a lot of typos and this represents a real pain point both for users and for the service providers whose schemes and systems it keeps them locked out of. So more recently, beginning with this work by Chatterjee et al. in 2016 there's been a drive to add typotolerance to password-based authentication schemes so the user may authenticate under a small set of allowed typos. So to give some context to this they show in a study that just five simple typos are responsible for 10% of all the failed login attempts at Dropbox and that's 3% of users turned away. And they offer an approach called Relax Checking which works well when we're just trying to correct a small set of typos like this. But it's not clear how security extends when we wish to correct more typos. And this is something we really would like to be able to do because they also show in their studies that if we could correct, for example, 64 typos this would allow us to fix half of all of those failed login attempts at Dropbox. This would be a really significant increase in usability. So this is the motivation for this work. How can we correct more typos securely than is possible with Relax Checking? So before we can really talk about correcting typos we need to define what a typo is. So to the password distribution we associate a typo model. So for each possible typo we define the ball around the typo to be the set of passwords we will accept it as a typo of. So to give an example of this if the sort of oddly capitalized string on the left is a typo we could, for example, quickly generate its ball in this list of candidate passwords by applying corrective functions based on the kind of typos we know users make. So, for example, applying caps lock and flipping the case of the first letter. And we will allow authentication under some typo if the real password lies in its ball. Now clearly we have to define this typo model carefully because a scheme that just accepts any string in the place of the password is going to have very bad online security. So in the paper we talk about how you can do this and going forward we assume that we have a good typo model so online security is taken care of and we're just focusing on this offline case. So the relaxed checking approach that I just mentioned is a simple way of adding typo tolerance onto a password based authentication scheme. So registration works just as before but now when it comes to authentication we don't just compare the string which is entered but we generate the ball of it and hash all of the strings in the ball looking for a match and the user can authenticate if a match is found. So you can see in this example that the user has successfully authenticated even though they entered their password with a typo. Now this typo tolerance comes at the cost of doing a bit more work because now the amount of slow hash computation is acquired is equal to the number of points in the ball. So it follows that if balls are of size beta and we want to achieve some runtime RT we need to set the hashing cost to be equal to the runtime over beta so we're having to use a faster hashing cost. So as before the best attack is just going to be to query as many of the most probable points as the attacker can afford in his runtime but since we're using a hash function which is beta times faster it follows that he's going to be able to try beta times as many points and we get this increase in the offline success probability. So this is the question we want to explore in this work. We know we want to correct more typos to improve usability but with relaxed checking this means using a faster hash function to preserve the runtime which then speeds up offline attacks. So you want to see can we find a way to correct more errors securely than is possible with relaxed checking. So to do this we explore the use of secure sketches to allow us to correct more errors. So our first contribution is we construct a new sketch which is close to optimal and has improved security bounds over the previous best known construction from this paper by Fuller et al. So we show how to build sketch assisted checkers and in order to analyse them we have to extend sketch security so the setting with the attacker can make multiple guesses. We also define a new kind of more sophisticated variance of relaxed checking and then we develop a framework within which to compare these approaches and the results we find are somewhat surprising in that the seemingly most rudimentary relaxed checking scheme turns out to offer a better time security trade-off than these more sophisticated schemes particularly with sketches. With the intuition for this being that when we extend sketch security to consider the multi-guess setting we find that security often deteriorates much more quickly than one might expect. Let's dig into what this means a bit more. So secure sketches are designed for settings in which secrets are measured in a noisy fashion and we'd like to be able to reconstruct some secret from some noisy reading of it. And we're focusing on the distribution sensitive setting in which the distribution and typo model is known at the time of building the sketch and this isn't always reasonable as an assumption but it actually is with passwords because we have so much good data from password leaks we can model these distributions pretty accurately. So the key idea is that we're going to store a little piece of information about the noisy secret which we call a sketch to aid this reconstruction process. So we have a sketching algorithm that takes as input a password and returns a sketch to that password and then we have a recovery algorithm which takes as input a string and a sketch and outputs a guess that the password underlying a sketch and what we want in terms of security is a guarantee that if the real password lies in the ball of the entered string then this recovery process will succeed with probability greater than equal to 1 minus delta. So in terms of security we want to make sure that learning a sketch does not make it significantly easier to guess the underlying point. So we model this by sampling a password, sketching it and we give the attacker a sketch and he puts out outputs a single guess at the underlying point. So we measure this in terms of the average case min entropy which captures how well an unbounded attacker can guess the password given a sketch and a single guess. So in terms of what we can hope for with sketch security Filler et al in their paper identify this property of distributions and typo models called the fuzzy min entropy which is kind of like an analogy of min entropy but for the fuzzy setting and they show that the best possible security a sketch can achieve is this top bound of the fuzzy min entropy minus log 1 minus delta and they give a clever construction based on universal hash functions which gets pretty close to the optimal bound except for the term highlighted in red and what we show is that it's possible to dispense with this problematic term and get substantially closer to the theoretical optimal bound with this last term shown here and this all the saving of entropy is very important especially with passwords where every bit of entropy counts. So what we call our construction the layer hiding hash and it works as follows we divide the password distribution into layers such that points within a layer are similar in weight and then we associate to each layer a family of strongly universal hash functions where the output length depends on the layer in which the point lies so the more probable the point the shorter the hash we use to try and leak less information about them so to sketch a point we choose a salt and simply hash it with the hash function from the appropriate layer so up to this point we have more or less described the scheme of fuller et al they kind of stop and output this because for their scheme to achieve correctness they need to explicitly reveal the layer in which the point lies and what we show is that if you use strongly universal hash functions and tweak the layering scheme a bit this is no longer necessary so we try and hide the layer so we do this by padding the sketch up to some fixed length of random bits and the upshot of this is that all sketches now look the same length regardless of the layer in which the underlying point lies and this is where the layer hiding hash name comes into play so when it comes to recovery the recovery algorithm is going to be presented with something like this so it has an entered point and a sketch and it's not obvious immediately what layer the underlying point lies and in which hash function we should be using so to get around this the recovery algorithm simply generates the ball of the entered string and it hashes all of the points in it with the appropriate hash function for the layer in which they lie and it compares these to the sketch truncated to the appropriate length and outputs the first match that it finds and we show that the correctness follows from the strong universality of the hash and a careful choice of parameters and then by not explicitly revealing the layer in which the point lies we save these extra bits of entropy and achieve the claim security bound so this is great we have a new sketch which is like closer to optimal than ever before so we're now going to use this to build a sketch assisted checker and so essentially we're going to bootstrap a secure sketch onto the password based authentication scheme so now registration we also run the password through the sketching algorithm and we store the sketch along with the digest so then authentication in addition to comparing the entered string we also run that string in the sketch through the recovery algorithm and we test the output of recovery as well and the user may authenticate if either matches so you can see here that even if the user enters the password with a typo as long as the recovery algorithm can correct the typo he can still authenticate and this does kind of highlight a slight drawback of sketches in that we had this delta possibility of error whereas for the relaxed checking we will always recover the correct point but on the upside we're now only having to use two of these slow hash computations so we can use a much slower and more secure hash function which is going to be better for security but this does come at the cost of having to store the additional information in the form of the sketch and the attacker can use a sketch to streamline their offline brute force attacks so to analyse this we extend the sketch security to the multi-guest setting so we define the queue conditional man entropy which captures how well an attacker can recover the password given a sketch with queue guesses and this is often much more realistic but it's been kind of overlooked in the literature and what we find giving a hint at the negative result to come with sketches is that when we take the FRS and layer-hiding hash sketches which are close to optimal in a single-guest case and analyse them with respect to multiple guesses the gap between the theoretical optimal security and what they achieve widens okay so finally we're going to throw one more scheme into the mix which we call popularity proportional hashing and it's essentially relaxed checking but distribution sensitive so we split the distribution into layers and each layer gets a different hashing cost so the more probable the points, the longer they get hashed so we recover by brute force ball search but we use different hashing costs in the process so we have these three schemes just so just to recap the simplest is relaxed checking we correct errors by brute force ball search with a fixed hashing cost we have PPH which is a brute force ball search but varying hashing costs and then we have sketch assisted checking with the layer-hiding hash and the fuller et al sketches and we want to know which of these schemes offers the best time security trade-off so we analyse this in the following framework we fix a distribution and typo model that we want to correct and then we set the hashing cost so all the schemes achieve the same authentication runtime so for example here we're going to be using a much faster hash function for relaxed checking then we will with the sketches so with this in place we then compare the password recovery security and what we find is that PPH always offers a better trade-off than sketch assisted checking and this holds what regardless of the underlying distribution and we further show that not for all distributions but for many of the kind of real-world ones we're interested in such a password it turns out that this seemingly simplest relaxed checking actually offers the best trade-off of all just to give some quick intuition about these results for the first comparison sketch security usually focuses on upper bounds and attack success probability but to conclusively prove that PPH offers a better trade-off we want to know about the attack success probability again this is something that isn't so often considered in the literature so to do this in the paper we show how to reduce the sketch guessing game to a weighted balls and bins experiment and from this we extract a lower bound on the attack success probability against sketch assisted checking and we use this to show that PPH always offers a higher level of security for the same authentication runtime and we emphasise that this holds regardless of the underlying distribution and the attack runtime so for our second result it isn't unconditional in this way but it does hold for many of the distributions we'll actually be wanting to correct in practice so we prove this by comparing how the attack success probability for the schemes grows in relation to different attack runtimes and what we find is that for PPH and sketches the attack success rate grows linearly with the runtime and this is in contrast to relaxed checking where for small runtimes we see quite a dramatic speed up but then this levels off so at some point there'll be a crossover point after which relaxed checking offers a better trade off and what we find is that this crossover point often occurs in the attack runtimes that we're concerned about and for passwords and many other real world distributions so it turns out in these settings relaxed checking gives a best trade off of all so in conclusion we explore new techniques to try and correct more typo securely we especially look at sketch assisted checking and construct with new and close to optimal secure sketch and we build some new schemes and we find somewhat surprisingly that the simplest relaxed checking approach actually offers a best trade off of all and in the sketch assisted checking sense this is due to the fact that when we consider sketch security in this multiple guest setting we find that security degrades much more quickly than one might expect so just as a kind of caveat we're not saying that secure sketches are useless our results hold when a brute force bull search is feasible and when we're concerned about computational adversaries who are bounded in terms of the number of random oracle queries we can make and furthermore we've considered these specific sketch constructions like we can gesture that our results hold for all sketches in the sense that any intuition allows us to design a better secure sketch can be used to design a kind of analogous relaxed checking type scheme that will again achieve a better trade off but that remains as an interesting open problem and more generally it would be interesting to explore sketch security and the multi guest setting more and find out is this gap between optimal and achievable security when the attacker gets multiple guesses inherent or can we find a sketch which is close to optimal in the multi guest setting so that's all for me thank you for listening