 Good morning, everyone. Welcome to Functional Conference 2022. We are at the talk by John Azaria about nature-inspired optimization algorithms. So John, thank you so much for joining us and I'm handing over the mic to you. Thanks very much. It's a pleasure to be in a functional conference. I've been in a specifically functional conference for some time now. And over the last couple of years, I've been fully around with writing some optimization algorithms and I thought, well, this would be a good time to actually share some of the learnings that I've had in terms of implementing nature-inspired optimization code in EFSA and Rust. So my name is John Azaria and I'm a principal architect at Microsoft. I work on that as a Kubernetes team. I used to work with these fine people up at the Microsoft quantum team where I was one of the people who built the first version of QShop, Programming Language and Quantum Computers. I wanted to acknowledge Dr. Helmholtz-Kreber specifically for giving me a lot of mentorship and helping me along in trying to build out my skills in the optimization space. There's a disclaimer that this is not an official Microsoft talk. So for actual official Microsoft quantum material, you will need to contact the Microsoft Quantum team. All right. So we have a lot of material to cover today. I just want to walk through the agenda a little bit and then sort of get moving. I'm happy to take your questions at the end. So please put your questions in there. I'll be happy to address each of them as we go along. So let's start by talking a little bit about the current state of classical computing. What are the big problems we have and why we can't actually solve them? A little bit of an introduction to quantum computing and then segueing into quantum optimization and optimization in general. And we'll wrap up with showing some code and talking about magnets and long walks. All right. So in terms of classical computing, 48 years, we have seen the trend. It's very clear about how machines have actually evolved over time. We have a lot of transistors on chips and the transistors are really, really small now. But the clock speeds have patterned for 20 years. This is reality. This is fundamentally because we are hitting the limits of physics at this point. For example, this trend, the orange trend that you see in terms of the number of transistors on the chip. You can't make chips any bigger. You can't make the transistors any smaller because we are now reaching the point where transistors are literally toggled on by one electron. And you can't remove the heat fast enough. So this trend, while it's been held for a long time in terms of Mohr's law, will actually plateau off. So you've seen the typical power plateauing off because of the heat condition, the typical frequency plateauing off, single thread performance plateauing off, and eventually number of transistors per chip will also plateau off. So we've reached, in some sense, a dead end and we need to do something to improve further. Now a reasonable question to ask is why bother, right? Like we solved all the big problems, why do we need to even look at bigger and faster machines? The reality is that there are some problems that we can't really solve. For example, when we want to do drug discovery or carbon capture, any of the sort of emerging technologies that are needed, anytime we want to simulate a natural process, it turns out that classical computing actually doesn't scale up well enough. So for example, let's look at hydrogen, which is a single electron system. And just to map the states out of that single electron requires us to know quite a bit about its quantum state. Now as you add electrons to a molecule, every electron that you add doubles the space required and we hit a very hard limit very fast in that when you want to simulate a thousand electron system like a reasonably sized molecule, you don't have space, right? Because there are only two to the 400 and something atoms in the universe and you need to store at least for a thousand electron systems in the order of two to the thousand coefficients for the calculation. So we don't have space, we just run out of space in the universe to be able to do this kind of thing. And that's a reality. So we're going to hit that very fast if you want to solve any problem relating to molecule simulation. This is cryptography. I mean this is a 2048 bit system. To factorize that number is necessarily hard. In fact, we depend on the fact that getting the two individual factors of that supply to actually work properly in semi time to work properly. Now, if you wanted to look at a 1024 bit number which is half the length of that thing, you need half a million four years to factor. You need this complexity because we depend on it. And if you want to now build systems that actually need to either defeat the cryptography or get better systems, then you need systems where just doing the math is more difficult than you can do classically. Another common class of problems are the NP-complete problems. These are provably non polynomial type problems. They turn out to be profoundly useful for industry. Like you would have all come across time and says when or free sample, the integer map sample, a couple of others, which are in wide increase. It turns out that all NP-complete problems are equivalent for some reasonable meaning of the word equivalent. They can all in polynomial time be transformed into each other. So solving one problem effectively gives you within polynomial time transformation solution to any other problem. But by definition, these are not solvable in polynomial time and you can prove that they're provably not solvable. So there are very, very difficult problems. Unfortunately, we are really, really dependent on those problems for any number of real world solutions. Like any time you want to send a box to Amazon or you want to optimally pack a ship or you want to route a packet across one place to another, you're running to the limits of the applications of these solutions. So it's a very vital importance for us to be able to solve NP-complete problems and let's quickly take a look and see how quantum computing has some impact on any of these problems. So in a nutshell, and I'm going to really rush over this because there is a wealth of information out there. You know, we have some basic quantum computing concepts. The key takeaway in this game is that solving a problem twice as big only requires one addition qubit. So if you just add an extra qubit, you can solve a problem twice as big. And you can easily do the simulation piece for molecules by mapping qubits to electrons. We have well-known, well-researched mathematical transforms to do this kind of thing. So this is fairly straightforward as long as you have enough qubits. Similarly, we also have a provably fast exponential speed up for intensive activation. This is the one sort of like core algorithm that everybody knows, the Schoes algorithm. It basically proves that you can in polynomial time actually factorize and integer. So this vastly improves the amount of time it takes to actually do integer factorization. However, the state of the art in quantum computing isn't very happy at the moment. But quantum hardware is still in its infancy. We're still researching what kind of qubits we can build and what those characteristics are. The current qubits that we have are effectively extremely noisy. And what that means is that they only hold their value for a very short period of time. And they're also incredibly susceptible to interference by the environment. And trying to build a logical qubit, which can actually do a computation for a significant period of time, is still a challenge. It's an engineering challenge that's out there at the moment. So we don't have any record of logical qubits at this point. So if you want to really be brutal about it, depending on the technology that you use to build the physical qubit, you need between 1,000 and 10,000 physical qubits to create one logical qubit. And because we only have the order of a few dozen physical qubits at this point, you can safely say that we don't even have one logical qubit. Mass manufacturing is not possible. Engineering wise, there are a lot of challenges involved because you have to work with really exotic environments like super low temperatures and very, very fine tolerances for the material that you build. So the engineering challenges involved in actually building on qubits is not at the state of what you might call maturity. Similarly, on the software side, you know, everything's ad hoc, different programming devices have different programming interfaces, and a support of different programming languages and libraries and different mechanisms for actually interfacing with them. They don't know industry standards at this point, even though I was actually involved in building the first language for programming, quantum computer, we aren't the only one. Our language at the time anyway was still relatively rudimentary. There's a lot of scope for improvement in terms of code optimization type theory and so on. And all of that is very primitive. It's interesting because you might expect that after a few decades of people working on one of algorithms that we would have a whole slew of algorithms that we could basically employ. Not the case at all. Literally the entire canon of algorithms that we know of that use quantum mechanical phenomena to do quantum computing, three on one webpage, and in fact that webpage is maintained by Dr. So we are still struggling very hard to find new algorithms to do so. I think, charitably, we can say that we are hampered by the fact that there is no hardware or no computer that actually support the development of new algorithms. So it's entirely possible that we'll have new algorithms for programming quantum devices. It's possible that the inventors of those algorithms are in kindergarten at this point, and we need, we owe it to them to give them as much of a head start in terms of hardware and infrastructure so that they can actually come up with new algorithms. But the reality is at the moment we have a very small number of algorithms. We have demonstrated actually that quantum computing is real and has a significant advantage over classical computing. But that comes with a caveat that the proof was not actually some real world use cases that very contrived engineered question that was asked so that the answer would come back with yes this is faster with quantum hardware than it is with classical hardware. And now about the hard problems, what about the NP-complete stuff? So we can solve, as I mentioned, the molecule simulation problem, which is the canonical classical, I mean quantum computing problem. And in order to do that, we need a lot more qubits than we have. We need between 1000 and 10,000 logical qubits to solve anything that's sufficiently useful. We would like to solve a whole bunch of really interesting problems that surround things like carbon capture and fertilization manufacture, nitrogen capture and so on, which we cannot do classically because the molecules involved are large and complex. So we can estimate that if we have 1000 logical qubits we can get started to do something useful. In terms of the integer factorization, well, we've made some progress there. We can successfully factor 35 into 7 and 5 on quantum hardware. And we occasionally get the right answers as is common in statistical computing. But you need 4000 logical qubits to crack 2048 at least. So we are some way from this we can safely say to this point. And in terms of the NP completeness, let's introduce the savages in the problem. This guy is a well-known problem. It's very easy to state. Given a bunch of cities find the fastest way that traveling says we can get all those cities and come back to where they started by meaning the word fast as being either the shortest possible time or the shortest possible distance for some other metric. And depending on how you lay a problem out, it turns out that finding out even if you've got the best solution is actually quite hard. So once you get beyond a bunch of 50 or 60 cities, if we start hitting some really, really hard limits like this particular problem scales worse than exponentially. But it's profoundly as well as we said, and the kicker is that there is no quantum known, there's no known quantum algorithm that can claim to solve NP complete problems any better than classical machines. That is to say you don't get a quantum advantage to go on an NP complete problem. And this is very important because I mean, we don't know what the relationship between NP and P is that's still an open question. But any claim that you have a quantum algorithm that sort of defeats NP completeness is viewed with extreme suspicion and then it usually turns out to be for good reason because there's still no known quantum algorithm that has actually been successfully shown to be any better solving NP complete problems than classical ones. So, let's come to the meat of this talk. And I want to spend the next 20 minutes or so, actually walking through some some real interesting problems, where we actually talk about those NP complete type problems and how to use, how to write some code and how to use that shop and, you know, other functional languages to actually solve these problems. So I want to introduce you to the Ising model. The Ising model, you know, just sort of schematic of to the Ising problem. The Ising model is a very good approximation of what how magnetism works. Right. So if you think of each of these arrows as representing some form of an electron spin around something, then depending on whether this pin is clockwise or not, you will find dipole moments that basically interact with the electrons next to. So, by basically trying to induce order system, all the spins will try to flip in such a way that the system reaches a ground state. So in this particular case, we have very clear rules about what the interaction patterns between each cell and its neighbors is and we've we basically define that the lowest energy state is when all the spins are aligned, and then we try to find a way to solve that problem. Now, the classical way of solving this is to basically use a mechanism like it's a kind of a Monte Carlo system where you start with a random setup, and then you check to see when you flip an electron, whether it actually knows the ground state of the system. And if it does, then you accept it. And if it doesn't, then probabilistically, you accept it, if the temperature of the system is high enough to allow you to jump over local minimum. And when you do this kind of simulated annealing type approach, you can then pull the temperature down little by little to actually come up with a ground state solution that's very quick. That actually represents the solution to the problem. Now in the case of ferromagnets and two dimensions code is actually very, very straightforward. So it's a really, it's a demoing piece of code. I'm going to just show you what that looks like. So this code is written in F sharp. A few interesting things to point out about it is the fact that we use units of measure to keep our numbers different. I mean, I have loads, some of which represent energy, some of which represent temperatures. I want to make sure that I don't add temperature and energy together to give us some answers. So in this particular case, as I mentioned, we create the matrix that says, well, my energy between myself and my neighbor is lowest when both our spins are aligned. And we represent that by saying that the coordinate, the difference, I mean, the coefficient between the two nodes in this graph or in this lattice basically is one when I'm a neighbor and zero when I'm not. And this is in effect. Now, it turns out that if you create a checkerboard type pattern, not when it's all constantly one, this turns out represent ferromagnetic model. If it's a checkerboard pattern, then the energy is lowest when every spin is anti aligned with this paper, and that is called an anti ferromagnet. When these coefficients start getting to be random numbers between zero and one, that is to say arbitrary numbers between zero and one, you have an energy landscape that's no longer just a simple step where you have high energy and then it sort of becomes a ground state very quickly. You end up with a very complex energy landscape with many hills and valleys, and trying to find the ground state of that problem is empty complete, and that is colonizing spin glass. So you can use the same problem program that I've written here to solve the spin glass by just providing the appropriate mapping matrix to it. So if you notice very clearly here, you know the computation of the interaction energy, and the combination of the solution of the thing is literally 125 lines of code. It's very, very close. So if in that you can actually see, I don't know if you can actually see this, I've zoomed everything in, but you can see that the units of the beta delta is actually energy per degree Kelvin, which is exactly what you want when you want to actually do this. So we have the ability to keep pipe safe and measure safe while doing this computation. If we run this program, we can actually see that we come up with a ground state that works. So, in this particular case, the ground state that we have is actually an island. We're not going to be able to relax the ground state any further all of these things are in one direction all of these things are another, but it went from this chaotic set of spins to that care expect seven spins, after 50,000 iterations at point man. You can see that the performance is actually relatively good we did this with a 20 by 20 matrix in about 250 milliseconds. This is, let's just go back and take a look at what made the score cool about it. So one of the first things that I mentioned earlier is the fact that the whole Metropolis algorithm, which is over here is under 50 lines long, and it actually matches the mathematical description of what we are trying to do as in to give the before energy and if the energy has reduced and accepted, if not, then decide whether you want to accept it, and then go ahead and flip the spin and try to see if that gets you into a better state. Now, this algorithm was actually published in a paper. And code here is actually very close to the kind of pseudo code that was published in paper. And we got that because of the expressive and test method of programming that that gives us. So units of measure basically gives us safety. So like I said, we have two very two types of floats here, one is going to represent temperature the other one represents energy, and we're going to constantly keep adding things in terms of energy but we don't want to be able to make the mistake of adding a temperature and energy together, and we're able to do that if you notice that we now annotate the energies appropriately, and the pipes interaction energy returns to us, the type which is a representative of the energy as we expect. And so this gives us confidence in being able to reason about the code. Similarly, as I mentioned, even when you do computations based on these units of measure, we get reasonable results in terms of in terms of what what the expression represents. Again, as I mentioned to you, this algorithm which is the, the system turns out to be very, very close to the, to the expression of the algorithm in the mathematics paper so when somebody wants to check or reason about the correctness of your code, and they have a reference on the paper, does this code represent what the paper represents? Well, we're able to do that in a relatively easy fashion because of the expressiveness of the language. So, in summary, we've used mutation judiciously so this isn't like a pure functional system, but we've used the, we've used the benefits of the language to give us the performance that we need, the clarity that we need, the expressivity that we need. However, we do have a system limitation in that it would have been nice to be able to say, well, in this situation, I would like the, I would like the system to be dependently typed and we create a nice involving that is based on the size of the matrix that we want, and there's no way to parameterize that given our type system. Indeed, there is a rust implementation which I wrote which does this parameterize engine because they allow us to do that, and also compiling down the native with rust gives us an implementation that's two times as fast as ZFRA, but two times as fast is actually not as profoundly important. 20 times as much would have been like, you know, an indictment on the lack of performance on the Fsharp side of things, but writing completely type safe, managed to be able to get reasonably fast solutions turns out to be possible to the Fsharp. Now, let's come back to the traffic assessment problem, right. As I mentioned, the problem is easy to state. In the 50 cities, you have a graph, you have weights between each of the cities, and now you're trying to tell me to find me the Hamiltonian cycle that has the lowest weight. And it turns out that this problem is super well studied. There's a lot of canonical problem sets that you can go in and take you to the page at some point and show you what that looks like. And there's lots of problem variations as well. So if you want to get started taking a crack on the traffic assessment problem, rather than look at leadboard, which may sort of emphasize the fact that you want to go with some kind of dynamic programming sort of model, which actually will not scale at all. To solve a real problem and get some real data sets, you can actually go into this area and start looking for some problems to solve. And indeed that's what I did. So I'm going to take a look at the code again and we're going to use a genetic algorithm in order to be able to solve it. So we'll come back to the algorithm in a minute. But the first part of the problem that I want to show you is the fact that we have in the in the problem specification, there is an eight or nine page document that tells us how the problem specification is made up. What is the format of your data input? And it turns out that when you look at that, functional programmers have a very different way of approaching a problem like that. When we see a problem like that, we think DSL, we think that there's a domain specific language you're trying in halting terms to tell me what the grammar of this language is by giving me a written document. Well, I can do better. I have a type system that I can actually represent stuff with. And so I can basically say, look, you know, here are all the things that I want. I'm going to have a string that come in dimension. A problem type can be one of those things. The node better be a number. It's based on zero. And I might want to consider a one base system. Or maybe based on one and I want to consider a zero base system. Right. And then there's 2D coordinates, there's 3D coordinates, there's edges and there's doors and weights and so on and so forth. Everything that you pulled me in eight pages of text, I have now put into 120 lines of code. And now I have an expressive way to reason about your language and be able to say, look, when I see a line in your data file, I can pause it into something that I can actually operate on. In fact, I can build out a complete way to graph based on the kind of problem that you gave me and I can compute for a completed graph I can compute it given to a link. So if you give me a tour, I can go down with that graph and compute the weight and tell you how long it takes to make that trip. Well, that's going to turn out to be something super useful. And we need to basically, we need to set ourselves up for success. And the first bit in there is actually about understanding how to block the data. And so, then you can write a puzzle. So here, again, I've taken the expressions that were outlined in the code and built out a puzzle that would actually build a structure that allowed me to traverse a given data input as if it was data structure that I could play with. And this code is actually quite good. So if you look at the first 300 lines or so of the problem, it's really just the creation of weighted graph. And you have a puzzle that basically has all the past functions. And that's literally another 150 lines of code. And most of what that code is is actually diagnostic information that allows me to go back and look at what kind of data is being passed and how I got the information that I got so that I can test how everything works. So it's actually very simple to think about the problem with a puzzle. So if you have the puzzle as one of your puzzle combinator library as one of the tools in your tools that you can play with. So, as an FP person, having the ability to map the domain into a type system, I mean a type library, and then having the ability to quickly come up with a puzzle to grab the data files, basically gives us an enormous amount of power in order to be able to run with the code. So let's go ahead and take a look at the genetic algorithm for building out a solution to this, to the transpace problem. So if you think of a graph with n nodes in there, and then you think of some random number, random ordering of the numbers from an end, that random number, that random order will basically represent a walk through the graph. It may not represent the most efficient walk through the graph, but if you start with a given point and end at the same point, then you create a Hamiltonian cycle. Say you've got five nodes, and your random number generator gave you a sequence that looked like five, three, two, one, four. Well, that means that that represents a tour that goes to the fifth city first, the second city next, the third city next, the second city after that, the first city after that, and then the fourth city after that, and then we'll come back to the first, the fifth city. So all we need to do is to represent a tour is to basically have an array with an ordering of nodes, ordering of numbers between one and end. So given that, you can basically start working on a genetic algorithm. And this is what a genetic algorithm looks like. So in our system, what we'll do is we'll create a population of random solutions. Each of these random solutions is that random ordering of nodes as I mentioned earlier. And we'll rank them according to the weights, because once you have a random solution, then we can go through and compute how long it takes to go down that tour. And we'll find the best tours and put them at the top of the list. Then we'll take this population and divide it into two bits somewhere and say, okay, this bit of the population is the elite population, and the other ones are not so elite. And what we'll do is we'll pick up parents from each of the groups and start trying to create a child from that. And we'll do that by trying to take as much of, we'll create a gene for each of the children from either the elite parent or the plebian parent. And we'll bias towards picking more of the genes from the elite parent. That's the bias key piece. Then we'll record the child genes to get its ordering, the ordering that it represents, compute its fitness. And then we'll create a new generation where we keep all the elites from this generation. A large number of children that we've now generated, which presumably are going to be better. And then to break out of any sort of local minima that we get, we'll introduce some fresh blood in terms of adding some new random candidates in there to try and see if they can actually maybe randomly become better than a form part of the next generation's elite. Now you can repeat this evolution cycle many times and you can come up with a solution. Now notice when you do this, you're not actually solving anything specifically about the transition problem. The constraints of the problem are embedded into the encoding of the problem, but the actual solution isn't like a dynamic programming one where you go off and try to find oh, is this better? Is this whole chain better than previous chain? No, throwback, go and backtrack one and do something better and so on and so forth. You don't have to do any of that. Here you're actually letting the system evolve towards a better solution stochastically. And so in order to do that, we can actually take a look at the code. As I told you, the parsing was an advantage and now let's take a look at the genetic algorithm that we have. So in this situation, again, the description that I told you about how to actually create a population is actually all in this 30 lines of code. It's literally right here, right? So get the members of this population, get the elites, pick a random elite, get some random children, get some random mutants. This is basically introducing the new population. And here, what we do is do a parameterized variable crossover with a bias towards the elite parent and you pick the elite parent and the other parent and you go off and you create this child. Now, we take advantage of the fact that once the solution is built, it is completely immutable. So the problem is represented in terms of immutable components. So parallelization literally falls out of the mix. So we just go from map crossover to array parallel map crossover, which basically allows us to do the building of the next generation in parallel. Similarly, it is possible because the elites are always going to be pulled into the mix and the mutants will be somewhat random as well. And the children are going to come back with a, with the fitness and a, you know, we compute the fitness only once because again the program doesn't change. So because we do that, we're able to basically replace the sort by a three way bridge. And in fact, this will improve performance even further by reducing it from the n log n type limit to just literally being linear. You swap things in until you get the new elite population, which is the only thing you care about. And then after that, you basically just make sure that they're unsorted and it doesn't really matter. So effectively, by looking at the problem and taking advantage of the office immutability, we're immediately able to write very clean code. But that's also leveraging the problems properties to be super performant from that perspective. So in this particular case, we're able to actually run and show that this works. So the FDA advantage is effectively going to take advantage of the fact that the properties of the problem. Come with immutability properties of the problem have you really really good and therefore we can take advantage of that and take advantage of parallelism and the fact that you can actually compute. So limit the compilation to one or one time and be able to also reduce the amount of sorting required because you can take advantage of the fact that linear merge will actually be faster than a nn log n type of sort. So, as I mentioned, this is again another example where the code that is actually written follows very closely with the specification written in the paper. The entire evolution of the code is yet to be 30 times long, and it exploits parallelism that comes inherently with the problem. So as I said, the summary for this problem is that this is a showcase of using positive DSLs and making that a part of your toolkit, and then using immutability is part of the solution when it offers itself as part of the problem and try to keep the code as close and readable. The code is actually still fast. I can give you a quick demo. So for example, let me run this code. And it's going to basically. Yes, that's more like it. Yeah, so we actually had traffic says no problem. We ran over 5000 times. We kept daily population to 25% added 15% nutrients in there and 75% of daily parents chromosomes as opposed to 50. We started with a population of 96, and we had an initial random fitness of 102,000, and the final fitness of 27,000. So we actually can do it very fast. In fact, if we want to see what that looks like. We can take a look at the graph that we've written out. It's, it's quickly done this, and she give you a sense of how just by reinforce reinforcing good characteristics of a solution over others, you come up with a solution that actually lands up converging fairly quickly. So as you can see here, we started around about 100,000, and we basically came down to about 25 30,000 fairly quickly exponentially fast. You'll also be able to tell that the code. We didn't pay for the expressivity by by by by writing performance. And that's the conclusion I had so the key points that we have, we went through the material here. We approaching limits of Moore's law. We have some quantum computing advantages for some problems that classical computing can't solve. We are where we are in the quantum computing stages that we're not all that far ahead in terms of being able to solve real world problems. The hard problems continue to stay insoluble at this point. Optimization scales hard. And as of NP complete problems and equivalent, we evolved solutions for two NP complete problems, the icing spin glass and the the Hamiltonian cycle ESP, and we use two nature inspired approaches for solving these using using ESP and leveraging the benefits of the language in order to do so. So that's my talk. Here are some links I will post these later we can come and ask you for this. This information is available if you follow me and find the nature inspired optimization report there. I think we're out of time we have maybe one minute for time for questions, but I will actually pop over to the table and hang out with me after. Thank you very much. Thanks John. Thanks a lot for your talk.