 Hello, everyone. You're getting close to the end of the conference by the last session of Track 5 about Symmetry Key Designs. We will have four talks. The first one is by Evit Sandi Kulich from National University of Singapore. The title of the work is How to Use Metarcheoristics to Design of Symmetry Key Privilege. Evit Sandi. Thank you for the introduction. So metarcheoristics are automatic tools. So first let's take a brief look at these automatic tools and try to understand what kind of problem we are trying to solve. The automatic tool is an algorithm that solves a particular problem and it's feasible to run. So that emphasis is feasible to run. We only consider those algorithms. In Symmetry Key Crypto we are using various automatic tools both in analysis and in design. So in the analysis part in the several years you see the explosion of these automatic tools for analysis. Now we have tools that can automatically check the security bounds of Cyprus and hash functions. Again there is differential attacks, linear attacks, middle attacks and so on. The other hand in the design part of Symmetry Key Crypto we also use automatic tools. We are used to finding good ground transformations. For example good s-boxes that satisfy certain criteria, good diffusion layers and so on. But also we use analysis tools for design of entire primitives, so not only the ground transformations. How do we do this? Basically we assume that we have analysis tool and check some primitives against certain type of attack and then we fix a class of primitives and then we apply this tool to the class of primitives. For example when the class of primitives is small what kind of tool do we use? We use brute force. So for example if we want to find better shift rolls of sets in AES that produce new AES, weak AES that has better resistance against impossible differentials. But we know that we already have the tool that develops Cyprus against impossible differential attacks because the number of possibilities of new shift offsets is very small, we can go through all of them. To each of them apply the tool and find the best shift offsets. So when the class of primitives is small we use brute force. And the class of primitives is large, of course we use random search. So for example if we want to find better mixed whole matrix in AES that produce better resistance against impossible differential attacks again we assume we have the tool, we have the method that returns the security bounds of any Cyprus against impossible differential attack and you should know that such tool exists. But now in this case there are a lot of possible mixed whole matrices so we only have to pick a small subset and find the best among them. So the question is, and this is what we are trying to answer, when the class is large, when the class of primitives is large can we come up with a better strategy, something better than random search? This strategy has to be efficient, at least not worse than random search. It has to be versatile, it can replace random search in many cases and of course it has to be simple, otherwise people will not use it. And this new strategy are metaheuristics. So what is a metaheuristic? It's a partial search algorithm used to find sufficiently good solution to an optimization problem. So we have some objective function fx and we are trying to find the maximum, let's say, of fx. Now the emphasis here is sufficiently good solution. Same like in random search, the random search will not find the best but will find some sufficiently good solution. Same thing with metaheuristics. So what are they called? Metaheuristics because they are high level heuristics as we will see, so kind of universal. So what are the main features? The first thing, approximation, whenever you use them, you should understand that they not necessarily will return the maximum. On the other hand, they are practical. After we decide to stop them, they are going to provide some good solution. They also seem like random search require only a black box but access to the objective function. They are very, very simple to implement, almost as simple as random search. So the best analogy for metaheuristics is Swiss Army Knife. Swiss Army Knife is a universal tool. It can be applied to many different tasks but for none of the tasks is the best, right? It has the fork, it has the spoon, it has the knife. You can do a lot of things with them but actually if you take a full-blown knife it's much better than the Swiss Army Knife thing. So same thing with metaheuristics. They are really good if you are trying to solve a particular problem and you have no other way of solving it. But it doesn't mean that what metaheuristics will return is the best possible solution. So try to use them when you don't have any other better approach. Otherwise try to come up with your own algorithm. So they are very specific of metaheuristics. I call you to the solution type. Some of them try to find a local optimum of the objective function. Some try to find the global. Also I call you to the candidate set size. We have a single and population based. So what's a single based? So all of these metaheuristics as we will see now, they are iterative. So they work with some candidate solution and through iterations they try to improve the solution. So these single metaheuristics work with one solution whereas population based metaheuristics work with the set of solutions. And also usually most of the metaheuristics are inspired by some nature phenomena but there are some that are not inspired by nature. So this is an incomplete list of different metaheuristic algorithms. As you can see there are a lot of them. And we are going to focus on two very famous metaheuristics that have been here for a while. And those are simulated annealing and genetic algorithms. So first let's focus on simulated annealing. It tries to find the global optimum. It works with a single solution through iterations and it's nature inspired by annealing in metallurgy. So simulated annealing is very similar to hill climbing algorithm but it has a special mechanism that escapes getting stuck at local maximum. Let's take a look how. This is the hill climbing iteration, right? Hill climber, you have a candidate, it's a iterative algorithm. So you have a candidate solution, some point and then from this point you are trying to build another point and another point so that eventually you are going to find the optimum. So as you can see it's extremely simple. So basically if you have some candidate here you just generate another candidate in the neighborhood of that point and if the value of the function on that point is higher you accept that it's a new candidate. It cannot be simpler than this, right? So for example if you try to run this this is our initial candidate X so we select another candidate in the neighborhood, another, another all of these get higher values for the physical object function so we accept them. So if it's lower value we cannot accept, we have to go back and eventually hill climber will find the optimum but only the local optimum. As you can see it always gets stuck at local optimum so unless we start somewhere here we will never find the global optimum and this is our task. So this is the hill climber iteration and as you can see similar at the kneeling it's very similar, it just has a few more lines of coding. So same like hill climber it generates a new candidate in the neighborhood if the value of function on this new candidate is better then accept but if the value of the function is worse it can still accept but only with some probability that depends on the step of the iteration. So at the beginning it will accept solutions that have much worse value of the objective function than the candidate value and then later this probability reduces so this is the probability here. So it has additional parameter which is the temperature team it changes with time getting reduced and reduced and with this thing it reduces the probability of accepting the gradient solutions. So it's similar at the kneeling for example if we start here we generate a new point it's better for the objective function we can go up, up, up, up and actually we can even go down because in this value here it falls this rise on number between 0 and 1 smaller than this we can still go down but not always of course, depends on this probability so we go up, down and luckily we can escape and move to another kind of hill and then find eventually the global maximum. So as you can see the parameters of simulated kneeling we have to define this neighborhood function the initial temperature T and this how we reduce the temperature which is called the cooling schedule so that's the whole algorithm now let's move to genetic algorithms genetic algorithm also tries to find global optimum and it's population based it works with a set of candidates and it's also nature inspired based on genetic reproduction and survival of the fittest as we'll see now so as we said it's a population based so you have a population of any individual basically a set of inputs to the objective function and a creative algorithm from this generation we are trying to create a new generation of individuals new set of inputs to the objective function so how we do that using two methods the first is reproduction same like in nature we take two candidate solutions and we produce new two candidate solutions and the second method is mutation so basically we take one candidate and we produce a new candidate by changing slightly the candidate so how do we decide who becomes parents in this reproduction function and who gets mutated parents have decided it's something that's called selection function which is bias towards individuals on which the objective function achieves higher values same like in nature if on these inputs the objective function is better in the sense that this input will become parent and it will kind of pass the genes in the next generation so a single input can become parent several times for example this point can become parent in another reproduction function and some of the inputs will never become parents those usually have very low value of objective function and there are different types of selection functions and then let's see how the reproduction function works and this is why it's called genetic algorithm so basically the inputs and the parameters of the objective functions can be a vector so the arguments are called genes so this means for example here we have four it means the objective function takes four inputs and these are the values of the inputs of this candidate so when we use the reproduction also called crossover functions for example we have two parents we have to produce two children so basically we assign the random genes from the parents so the children takes these two genes from this parent and these two genes from this parent and buys us that's how we produce a new candidate input to the objective function and mutation as I said before it slightly changes some of the individuals with very small probability it will change the individual and it changes very slightly for example here 0 goes to 3 ok so that's the genetic algorithms so we have now two metaheuristics and we try to apply design of symmetric imprimitics so where can we use the metaheuristics first we have to clearly define the optimization code so the objective function has to be it has to produce numerical value it has to be clearly defined we use metaheuristics but of course the set space is large otherwise we can always use brute force there is no need to use metaheuristics also we have to make sure we understand that metaheuristics not necessarily will produce the optimal solution ok it will produce sufficiently good solution in most cases so we try to apply this metaheuristics to two different kind of symmetric imprimitics first is skinny which is a new tweakable block cipher why to skinny because it's kind of also highly optimized and designer also use various automatic tools to produce the parts of the design to be very secure and very efficient I'm not going to explain how it works basically it's very similar to 8 years but it has some key schedule and then in the key schedule we use this 16 element permutation of the nibbles and apparently the permutation that offers proposed was found with a brute force in a restricted space so not 16 factorial but small space and because it's smaller they can use brute force and they restrict the space because it's very efficient in this space so what can we do for example can we find a better permutation if we remove this restriction maybe the cipher is not going to be more efficient but maybe it's going to be more secure so if we want to formulate our optimization problem and apply the metaheuristics first we have to define the input and the input is a 16 element permutation so the input to the objective function is 16 element permutation with some additional restrictions I'm going to go into details and the objective function same like designs we want to return security bound of skinny against related tweaking differential attacks and this part so basically there is these automatic tools if you provide a cipher it will speed out the bound against impossible differential attacks sorry against related to differential attacks and this approach is based on integer linear programming the design has already used that so we're going to use as well this function the input is a permutation and then the objective function will say how many active S boxes are in the best related to differential trail when this permutation issues is key so that's our formulation of our problem so if you want to use the simulate annealing of course we have to define these few parameters what is the neighborhood function so because the input is a permutation it's a neighborhood function we can use one or two swaps in the permutation then we have to fix some initial temperature and we have to define the cooling schedule in other words how the temperature reduces over time and this is quite typical cooling schedule it's called inverse cooling and if you want to apply genetic algorithms we also have to define some parameters of population size and selection function useful for different types mutation and crossover so all of this is quite easy to define except maybe the crossover because the inputs are permutations the outputs also have to be permutations so we cannot just take random genes in the children and these are the results of the optimization so this is the number of evaluation of the objective function we used this is the original in the original paper the objective function is 27 which means 27 active S boxes in the best trail with some number of problems we managed to achieve 33 and 33 with around 1,000 evaluation of the objective function and we also applied to another symmetrically primitive some AS based constructions from MFSC I'm not going to go into details how they look like anyways we applied these six types that were already presented in that paper so this is what originally was presented and this is what we got so as you can see we kind of managed to improve these six out of seven and some of the improvements are quite high as you can see an interesting thing which is even simpler than genetic algorithms I'll perform genetic algorithms in this particular case so to conclude metacritics are very handy automatic tools so when the surface is large and we have no better idea what to do and how to cover the space how to find something in space the next best thing I would say are application of metacritics they're extremely easy to implement it can be optimized, it can be used to optimize designs according to other aspect criteria and not only security as long as the objective function is clearly defined so from my personal experience because I learn about this metacritics only a year ago and from all the experiments I've done I can say that how metacritics work is basically the beginning when you start trying here because all of these things that I implemented actually require a normal amount of time because the objective function is very expensive this integer linear program so basically to put it in the objective function we need maybe a few minutes because it has to produce a solution so they are very good at the beginning immediately provides some okay solution metacritics but then if you find you want to improve the solution it takes a lot of time so maybe you can use them only at the initial stage when you don't have any better idea how to fix some parts of the designs so just run the metacritics let them output some okay solution then you can maybe with manual analysis try to understand why this is a good idea then build even better that's all, thank you a comment and a question the comment is that automated search techniques have a very long history and the S-process of BES were actually chosen by the designers at IBM by searching with a large cell space using all kinds of short cuts to find the best S-process according to differential probability criteria now my question is the following in order to have some success in all the metaleuristics you need a continuous function because if it is totally random and there is no connection between the point and its neighbor so small variations of the neighbors then it's useless it's just like random search now when I'm trying to think about the effect of changing the order of the reputation of the needles in the design and I assume that you are repeating it from one subkey to the next subkey I don't see any reason why the function is going to be continuous I can see a situation in which you have a very good solution and you make one revolution and it will produce two elements and you get a terrible solution so can you explain why it succeeds there are such cases as you mentioned but there are other cases where this doesn't work so in some of the cases when you switch a little bit it gets improved so yeah there are cases when you switch it goes from really good to terribly wrong but there are cases where it doesn't it's mostly continuous in a second I cannot fully describe I don't know how it looks like there are easy measures of how continuous a function is you just take random pairs of neighborhoods neighbors and see what's the difference between them and it will be very interesting to look at the standard measures of continuity whether your objective function is sufficiently continuous yeah that's a good idea I have not done that thank you so actually if you go further with this idea first you make some point you have an objective function that is continuous maybe it's not that far from being perishable and so you could try to use gradient descent on your elements stochastic gradient descent for instance like we do with neural networks to find an optimal solution much much quicker definitely have you tried that approach definitely just these functions the objective functions are not elementary for example imagine a function that in a background runs integer linear program that outputs some number we don't know how the function looks like we are just squaring the function as a black box of course if you know there is a mathematical definition of the function we are going to use mathematical methods this applies only when we have no idea how the function looks like we can just squaring and usually these functions are very expensive so you cannot do 2 to the 20 in a second actually as I said before one query can cost minutes so there is no we cannot use method no method of optimizations will work and I had another question which is is there anything open source can we reuse these tools I have not shared yes sure the program you have seen is very trivial to the program but I can share it because probably a lot of people could help you improving if you were to share this maybe you could have some people willing to help I would be I am just wondering did you compare the result I mean better publication using random search I mean you just wrote a name algorithm a genetic algorithm but without like to compare the result using random search is it better no because I didn't because I assume I assume that what the designers propose they already use something more advanced than random search that was the assumption actually you just use like you relieve the constraint per vision I mean actually it should be like 1 to 6 should be just that lock to lock I mean you just relieve constraint in yes yes it should be better I mean it could be better even I completely agree yes it's not completely fair comparison that's why I have the second case of this AES based constructions where it's completely fair comparison the thing is with skinny action I found out only at the end that they have this constraint so I ran the full search for 2 weeks additional constraint with I didn't take into account and then I didn't want to throw away the result it's not completely fair it's not surprised that I got better numbers so I come from a point like okay what if we relax that's why I'm saying we are relaxing what if we relax can we come with something better I'm not claiming I produce better I was claiming this but no I correct because this thing I just said for one comment one question about the function my guess that's function is flat most of the time like locally flat then you get bumps and then it's flat again because if you change something the best differential pass is still the same I guess that's what the function is and I have one question I'm sorry I arrived too late for your presentation but does your program actually output the differential pass it can output it can output a few in Grom you can actually output the differential that you found in the integer linear program I use Grom it's over yeah if I want it can output just look how much time we have I think we are running out of time already we can keep it if it's short it can go on we will say improvement rate also 1000 iterations does it improve continuously and you start at 1000 only because you did it more time or did it flatten after 500 or get it started I recall how it was it kind of gradually similar to really gradually improves then it has a big jump then it goes down it kind of yeah it's improvement and then it kind of reaches some plateau and then it stays there increased it to 10,000 you don't expect no I don't expect thank you