 The next talk is called Cycler Slicer. Cycler Slicer, an algorithm for building permutations on a special domain written by Sarah Miracle and Scott Eleg from the University of St. Thomas. St. Thomas? St. Sorry, St. Thomas? Not the other one. Yeah, OK. And Sarah is giving the talk. Thank you. Thank you very much for the introduction. I just want to reiterate that this is joint work with Scott Eleg. So I'm going to be talking about two problems today that are related to format preserving encryption. So I'm going to start with a little bit of background. I'm going to start with an example. So let's suppose that you have a database and you're storing millions of US social security numbers. And so these have all of the constraints that social security numbers have. They're nine digit numbers. The first digits can't be 6 cents. And so I think, can you hear me in the back? I think maybe I'm not using the microphone. Yeah? OK, good. Thank you. All right, great. So the question you might ask, so this is an existing database and you want to go back and add encryption on top. Also, how do we do that? One approach is we could represent these social security numbers as 30 bit numbers, pad with zeros, and then use some sort of standard block cycle. Well, the problem with this in certain applications is that these numbers are no longer valid social security numbers. They have a very different format and don't fit all those restrictions I discussed earlier. So in the format preserving encryption schemes, we're looking for schemes that encrypt social security numbers as valid social security numbers, fitting all of those constraints. So this is the setup we're looking at here. And these don't just occur with social security numbers. There are many different applications, credit cards, email addresses, and so forth. OK, so I'm going to start today talking about some background on format preserving encryption and work that's been done in that area. And I'm going to introduce two sort of specific problems within the area of format preserving encryption. Then I'm going to talk about our cycle slicer algorithm. I'll explain where the name comes from and how we can use this algorithm to tackle these two different problems that I'm going to introduce. And then I'll very, very briefly discuss some of the proofs that go behind it. So not too much since it's the last day of the conference. All right, I will spare you all those details. OK, so just to give a little bit of background on format preserving encryption, the first scheme was introduced in 97. And the term itself, format preserving encryption, which I'm abbreviating FPE here, was coined in 08. And so there's many domains where we have solutions. So for example, bit strings or integers up to n. And there's even a standard that was published by NIST as well in 16. OK, so one potential solution to this problem. So if you have a domain where there's an efficient way to rank your elements and unrank them, then we can use what's called a rank inside for unrank approach. And so what happens here is if you have this target set in our example, social security numbers, and in order to encypher a point, you simply rank the point and you get an integer. Then you use an existing cypher on integers and then unrank to get an element that's again a social security number. So this is kind of one approach to handle the problem. And this approach has led to efficient algorithms and a number of domains. But there's some drawbacks here. So one of the drawbacks is that, again, this requires efficient ranking and unranking algorithms, which not all domains have. So we're going to be specifically today looking at the setup where you don't have efficient ranking algorithms. And secondly, sometimes these unranking algorithms can lead to timing information. So sometimes even if there is an efficient ranking algorithm, you may not want to use it. OK, so I'm going to look at one way to approach this problem, which we're going to be called domain targeting. And this is an existing approach that I think we came up with the name, but not the approach. So we're going to find a cipher on a larger set x. So we find a larger set that includes our target set. And we're going to look at an existing cipher on this larger set. And then we're going to transform the cipher on this larger set to a cipher on our smaller set x, our target set. So if we look back to the running example of social security numbers, if we consider x to be the set of 30 bit strings, there are many ciphers we can use here. And then that leaves us with the problem of how do we take a cipher on this larger set and turn it into a cipher on our smaller set. And so one way to do this is called cycle walking. And this has been around for a while in the literature for a while. And so the way we're going to do this in the case of cycle walking is to encrypt a particular point. Again, so we're starting with a cipher on this larger set. So we can think of that as a permutation. So we have a permutation on the larger set x. And we want to encrypt a point on this smaller target set. And we just evaluate the permutation on this point. Now if we get something that's in our target set that's in s, then great, we're done. Otherwise, we just repeat. And eventually, we're either going to get back to the point itself, or we'll find something that's in our target set s. So just to kind of explain this example a little further, and this will be helpful for the rest of the talk. So if we think about this from the cycle structure perspective, so let's think about the cycle structure of the permutations. Essentially what this does is if we look at these cycles, and we look at the points in the cycles that are in our target set, essentially it's going to basically erase all of the other points. And then we're left with just a permutation on our smaller target set. So this is kind of what happens from the cycle structure perspective. OK, so cycle walking was formally analyzed by Black and Rockaway in 2002. So if we assume that the size of the target set is a constant fraction of the size of the overall set, then we'll end up with a small expected running time. But the downside here is that the worst case running time can be really large. And so what happens is basically you can keep encountering points that aren't in our target set, and in the worst case, you'll encounter the size of x minus the size of s points. And in addition, we can have different running times, which can potentially lead time and information. OK, so you might ask, can we do better? And actually, last year at Azure Crip, I recommend you guys a few familiar faces, I gave a talk about another algorithm that we can use that overcome some of these drawbacks. And so we called it reverse cycle walking. And essentially what it does is instead of just eliminating these points in a cycle and creating a single cycle, we're going to break into contiguous, so we look at contiguous points in s. And we look at these cycles formed by contiguous points. And so we're going to break up basically into more cycles. And then we're further going to restrict and only look at cycles of size 2. So basically, if we look at the cycle structure of that original permutation on the set x, we're going to look for pairs of points in that cycle structure where you get a pair of points in s that are surrounded by points that are not in s. And so that's going to lead us to a matching. So we're going to end up with a matching. And we're going to apply this matching to get our permutation on our target set. OK. So one step of reverse cycle walking always takes constant time, which is nice. But what you might have noticed is that it's not going to end up after one step with a random permutation. So even if we start with a random permutation on x, this is certainly not random after a single step. You'll note that many points just get mapped to themselves. So the question that comes up then is, OK, so it's not random. How many rounds of this do we need to do? So we're going to need to do log of the size of x rounds of this and then we end up with a random permutation. OK, so we've lowered the worst case running time from order n to order log n, but we've definitely lost the benefit of that expected constant time algorithm that we had with this original cycle. OK, and the other downside to this algorithm is that it performs poorly when the size of s approaches the size of the larger set. So what happens here is if you, again, think back to that cycle structure, if we have a cycle and it has many contiguous points in s, then we end up just ignoring them. And so as the size of s approaches the size of the larger set, then we end up ignoring really not making much progress in a single round. OK. And so we're going to see that the algorithm I'm going to talk about today is going to overcome this drawback. OK, so that was domain targeting. I'm actually, so the algorithms that I'm going to talk about today, eventually I'll get there, cycle slicer, applies to also a different problem that's also related to format preserving encryption schemes. And so let me introduce this problem briefly before I go back to tell you about cycle slicer. All right, so the idea here is that prior to format preserving encryption schemes, practitioners use tokenization systems. And so essentially what that means is they would just lazily sample a random permutation. And they'd keep these pairs in a table and as needed add to this table. All right, well later on an efficient or practical format preserving encryption scheme comes for their application. And they now want to start using this format preserving encryption scheme. But in many cases, they can't just ignore these tables. They've already handed out keys. They need to be able to preserve these pairs that are in the table. So the question is, how can we add a format preserving encryption scheme while preserving these existing pairs that are already there? All right, so let me just formalize this problem a little bit further. So we're given a table where we have input and output pairs. So these two sets T and U are subsets of some larger set X. And our goal is just starting from a cipher on this set T, create a cipher on our set S. Sorry, given a cipher on our set X, we want to create a different cipher on site S that basically preserves these existing pairs. So we want to create a permutation on this whole set S that preserves this partial permutation that we're kind of forced to live with. All right. OK, so this problem was introduced recently by Grubbs, Rustin, Barton, and Yaram in year of Crip 2017. And they actually proposed two different solutions to this problem that I'm going to talk about briefly. So one of them is actually very similar to the rank and cipher unrank algorithm that we saw before. And so this is kind of a modification on that algorithm. So what they do is they take all the points on the input side of the table, and they rank all those points and create a table of those ranked points. And when they go to rank a point later on, they have to do basically a binary search on this table to figure out where it fits relative to the points that are already mapped. And so basically, they can take this. And I went over that very fast, so you don't need to actually understand what's going on there. But basically, it's a modification of this rank and cipher unrank approach with an added binary search on this table. OK. So again, this is great, but it has the same drawbacks that the rank and cipher unrank algorithm had before, and that their domains where we don't have efficient ranking algorithms. And it's susceptible to timing attacks, both from the binary search aspect and from the ranking side of it. OK. So they also introduced another algorithm, which is in some ways analogous to cycle walking. It's really neat how it works. And it only requires the ability to test membership. So essentially, what happens is if you think about this problem, we have this table of existing pairs. And so what you can end up with is you can have points that exist either only in the output column of the table or only in the input column of the table. And so we have to be really careful how we handle those points. And so if you try to encrypt a point just using your permutation in x, and it maps to something that's only in the output column of the table, we have to be careful. We can't just then again use our permutation on x, because that point that's only in the output column, that point itself also has to have somewhere where it gets mapped to. So cycle walking breaks down. So instead what they do is they take that point, they look at the matching input point, and then they look at where that gets mapped to. So they kind of do this zigzag thing. And it's quite nice. But again, like cycle walking, it has similar drawbacks. So you end up with expected constant time, but order n in the worst case for similar reasons. So we're going to talk about a way we can overcome this worst case. OK, so finally, I'm going to talk about our algorithm, which I'm going to call cycle slicer. So I'm going to start by talking about it at a high level, and then I'll show how we can apply it to the domain targeting and the domain completion problem. OK, so at a high level, we're going to take a permutation on some set. We're going to go from this permutation to a matching, and then we're going to take this matching, and we're only going to keep certain edges in the matching. So we're going to define what we call an inclusion function, and it's going to be different for the domain targeting and for the domain completion. And so we're going to take this algorithm, and we're going to take this matching and break it up. OK, so the name comes from how we go from this permutation to the matching. And what happens is we're going to take the cycles in the matching, and we're going to break them up. And so each point in the cycle is going to flip a bit, and it's going to create a direction. And if these directions match, so for example, 7 and 4 are pointing towards each other, then we're going to include that edge in our matching. OK, so we can take a large cycle and break it up into a certain number of pairs, right? Just using these direction functions. And if the direction functions match, then we keep that edge. If they don't, then we don't keep that edge. OK, so we're going to break these cycles up. OK, so in domain targeting, this is a pretty straightforward to apply this algorithm. We're simply going to only keep edges where both endpoints are in our target set. So if you think back to that cycle structure example, when we get these large continuous cycles that are all in S, instead of just ignoring them, we're going to break them up and just keep certain edges within that cycle. OK, so in terms of advantages, again, so over the rank and cycle and rank approaches, this is only going to require the ability to test membership. We get the lower worst case. So these are all the same advantages that reverse cycle walking had. But the additional advantages that we're going to end up with better performance when the steps are closed and we have some information on the improvement we get in the social security example. And so this, right, so we have to be a little bit careful because we don't want to cheat too much. And so it actually is more efficient with reverse cycle walking if we use a larger set x. Instead of 2 to the 30, we use 2 to the 32. And so then we get a probably more honest improvement factor of a quarter. OK, so let me talk about how we can apply it to this domain completion setting. So in this case, again, we have this table. And we need to preserve these table pairs. And so to help understand the algorithm, let's think about the cycle structure of the table. So the table is like a partial permutation. We have these pairs of input-output pairs. And so basically we get cycles. We get single points, things that aren't in the table at all. And then we get lines. So we have cycles, single points, and lines. And so what our algorithm is going to do conceptually if we take a step back from the details is we're going to basically ignore any cycles. So if our table has cycles, then that's already a complete permutation on those points. We can ignore them. So we ignore the cycles. And we're going to take any lines, and we're going to condense them down to a single point. So if we have any lines, our table has any lines, we condense them down to a single point. And then we're going to basically find a matching or find a permutation. We're going to find a permutation on this strange set. So we ignore cycles. We collapse the points to a line. And then we're going to find a permutation on this set. And then we'll expand the lines back out, add back in the cycles, and it gets a permutation on the original set. So at a high level, that's what we're doing. We're collapsing the lines, ignoring the cycles, getting a permutation on this strange set. And then adding back in our cycles and expanding our lines. Okay, so this sounds straightforward enough. The implementation details are a bit tricky to get this to work, especially since we want to be able to evaluate it for a single point, right? We don't know what's happening. We don't get this overall large view. Okay, so how do we do this? Well, what we're going to do, and let me maybe spare you the details of the, okay, so I'll talk about it briefly. Okay, so we have a point. And if the point is in T, so if it's basically in the input side, then we're done, right? So if we already have a matching in the table, we can ignore it. If we don't, then what we do is we do basically a little bit of pre-processing. And if the point is in the output side of the table, but it's not on the input side of the table, then for these points, these are the points that are at the end of the lines, we basically want to map them to the first point in the line. So if it's the last point in a line, we map it to the first point in the line, and then we run Cycle Slicer. And for every other point, we just run Cycle Slicer like normal. And what's nice about this is that this first and we'll also need last if we want to go the other way, we can pre-compute these functions and store them with the table. Okay, so, and then the inclusion function, we're only going to keep edges in the matching if both points are in X minus. All right? Okay, so advantages of this is that we're, again, we're only required to test membership. We can pre-compute first and last. This is going to lower the worst case running time from O of N to O of log in. And note that I haven't talked to you at all about where we get this log in, because again, right, just like with reverse Cycle Slicer, reverse, sorry, reverse Cycle Walking, we have to run multiple steps of this, right? So a single step is not going to give us a random permutation. We have to run many steps and so we have to analyze that process. And it's no longer an expected time algorithm and we get some of the other benefits that we had with reverse Cycle Walking. All right, so in the last couple minutes, I'm just going to talk briefly about this question of analyzing this. So we have a proof of correctness as well while our algorithm actually generates a random permutation. But the more interesting part I think of the proof is this question of how many rounds of Cycle Slicer do we need? And this question is similar to a question that came up when we were analyzing reverse Cycle Walking. And essentially you can think about these problems as we're applying a matching in each step. And so how many steps do we need before we end up with this random permutation? And so in order to answer that question, we showed that essentially, right, again, we can think of this as a Markov chain and each step we're applying a matching and how many times do we have to run it before we end up with a random permutation? And this is something that's called a matching exchange process. And more formally, a matching exchange process, we're going to pick a number according to some distribution. Then we're going to generate a random permutation of that size and apply that at every step, a different permutation or a different matching at each step. And this Sumaj and Kudolowski called us a matching exchange process and they analyzed it in 2000. And so we're able to really borrow a lot from their analysis. But one of the downsides to their analysis we encountered when we were looking at reverse Cycle Walking is that they give asymptomatic results and aren't really interested in the constants involved in this. And so in the context of reverse Cycle Walking, we were able to give some explicit bounds on these constants. And in this case, we were able to generalize those and apply them in other settings too. Those bounds were very specific to reverse Cycle Walking. So we were able to generalize those for this particular application. All right, so this is just kind of a formal definition of a matching exchange process. But it's relatively straightforward to show that both of these algorithms Cycle Slicer in the context of domain targeting and in the context of domain completion yield a matching exchange process. And then use some of these techniques that were developed previously to analyze how long it's gonna take. All right, so that was the algorithm. In terms of future directions, a lot of this analysis, we did some work on trying to come up with these constant bounds, but I think there are definitely artifacts of the proof not of the actual algorithm itself. So there's definitely improvement to be had there. I also sort of hid one particular detail, which is that after we pick out our matching, we're then flipping a bit for each of the edges and the matching to determine if we're keeping it or not, which I think is really probably not necessary. But again, an artifact of the proof techniques is that we need that extra bit of randomness there to make it go, the coupling argument go through. And then at the end of this, it still leaves me personally rather unsatisfied because that constant running time, expected time that you get from this exact algorithm and cycle walking is just so appealing that it'd be lovely to find a way to balance those two out without having to take that worst case of handles. All right. Thank you. Thank you. Thank you. Thank you. Thank you. You performed a number of permutations one after the other in order to get security. However, there are a group of permutations can be easily generated from just two permutations or one random permutation and one very simple permutation such as complementing or flipping a bit. So have you considered the proofs in which you have only one permutation which you mix together with some simple permutation and then repeat it telegrammatically many times? Try to simplify the implementation because you have to produce a large number, a very big number of different permutations but you can use only one of them, but... Yeah, we haven't looked at that. My offhand guess would be that it would make the analysis much more complicated if they're no longer end of the dip but maybe if we can make some sort of assumptions that they're close to independent or roughly close, we could build that into the bounds. Okay, thank you. Do you have one more question? When you make one round, I think you can trace where the single point goes but when you make our rounds, is it the case that you need to trace the positions of all elements? No, you can trace a single point through all the rounds because you know which position you're swapping with and so you can trace it through the entire round. Yeah, but when you apply the same, okay, you know to which position you go but then the other point, if you were calculating the trajectory of the other point, it went possibly to this point. Yeah, I mean, so it is possible to trace a single point but it's complicated to keep but we're keeping track because of the bit flips which determine where they get matched and which position it's moving towards, the way those are chosen you can trace where a single point goes because it really just depends, you just care where, you don't care what point is at the position you're swapping it with, you just care where you're moving it to and what the bit flip for that care is. We can talk more offline too if that didn't answer your question. Okay, thank you so much.