 Online is Professor Michelsen from New York. Crystal, are you there? Yes, I can see you. Yes, good morning. Good morning. So you can start sharing. She will talk about quantum annealing for optimization and classification. That is 40 minutes plus five minutes question. Okay, when you are ready, I will start. Do you agree with the recording? I mean, it would be... Yes, perfectly fine. Okay, good. I'll start in a second. So can you please confirm that you see my screen? Yes, I see your screen, yes. Okay, perfect. So good morning. So first of all, thank you very much for the invitation to give this presentation here. And unfortunately, I could not come to your beautiful place because of other commitments that I had. So I will talk about quantum annealing. So, but as you all know, in general, quantum computing has a big potential. And the applications are, first of all, quantum simulations. And then one can think of simulating quantum systems on a quantum computer with applications in quantum chemistry. Then one has optimization. And optimization problems one can find everywhere. So this goes from traffic optimization to also optimization in drug design and so on. And as optimization is a first step, usually in machine learning, an application is also quantum machine learning. And in this talk, I will give you one of our, or actually two of our works on optimization problems that are solved with a quantum annealer and also one in the direction of quantum machine learning with classification of images. So first optimization. We have our D-Wave quantum annealers which are the big commercial quantum annealers that are available. And what they do is they are built to solve quadratic unconstrained binary optimization problems of which you see the mathematical expression here. So we have some problem variables which are binary variables taking the values zero and one. And then in this expression, we have two coefficients, the bias, which is a real number and which is here in the single X term. And we have the coupler, which couples two of these variables and they is, or the Bs are also real numbers. So why might it be interesting to solve such type of problems with an annealer? First of all, discrete optimization is a hard problem. Then, seconds, the annealer can produce many solutions simultaneously which can lead to a speedup. And another nice advantage is that such a machine is or has a very low energy consumption compared to the traditional supercomputers. So qubits are of course connected for the computation. You see this schematically here in this graph. So we have here our coefficients A and then we have the couplers between the three qubits. We have been using lately for our work two types of quantum annealers, two generations, the D-Wave 2002 and the D-Wave Advantage system. As you all know, both have a different topology of the architecture of their quantum processing units. The oldest one has a chimera architecture. The recent one has the Pechasus topology. The 2002 has about 2,000 qubits, the Advantage about 5,000 or more than 5,000 and the number of couplers changed from 6,000 to 35,000. Now, because I was talking here about the connectivity between the qubits, the 2002 had a maximum connectivity of six and the Advantage has a maximum connectivity of 50. Now, all these numbers are larger and this also then results in larger optimization problems that can be solved on the more recent system. So the chip architectures of the quantum annealers and the quantum processing units are then visualized as these graphs. So here we see the chimera graph, here the Pechasus one, which actually includes a chimera graph, the one here. And all nodes indicated by the blue dots are the qubits and the black lines are the couplers which are a result of the connectivity of the hardware. Now, of course, these graphs here or the architecture of the chip limits the size and the connectivity of the optimization problems that fits on the chip. And we can have problems that do not require an embedding on the chip, but it can or in most cases, the qubit has to be mapped using an embedding. And then usually we have to use more physical qubits for one logic qubit. But in general, we can say that if we have a qubit, then the variables map onto the qubits and the interactions between the variables can be mapped onto the couplers. So recently we have been studying and this is work that has been published, but we did not yet put it on the archive. We have been studying three different types of quadratic unconstrained binary optimization problems. So the first class of problems that we studied are two satisfiability problems and they have a non-unique ground state and a highly degenerate first excited state. This is a class of problems that we have been studying for a long time because these are problems that are hard to solve with annealers, but the problems are easy to solve with our traditional computers. So in that case, the solution time is of the order of n where n is the number of variables. Now, when we translate this problem to a qubit, then the graph representing the qubit is not fully connected. The second class of problems that we studied are fully connected spin classes where the values for the J and the H parameters are uniform pseudo random numbers taken into the interval minus one plus one. Now we know that these are problems that are very hard to solve on traditional compute. Then by studying these problems and putting them on the supercomputers to solve them, we have observed that there is what we call a fully connected regular spin class model with this type of parameter setting that is very easy to solve. So what we then did is for all classes we took small problems with up to 20 variables and we did an exact numeration to find that most 6037 states. Now when I say we take small problems, then for the two such problems, this is more or less the biggest problems we can come up with. It's not easy to find these two such problems having these properties and finding them requires a lot of compute resources. Then we studied these problems and what we did is we studied the hamming distance. So we looked how many spins we have to flip in order to come to the ground state. And then we have here this distribution of the hamming distance taking into account the 6037 states. Here you see also the gap between the ground states and the excited states and what is indicated here is the frequency. Now for these two such problems, we only have the ground states and all the rest are first excited states. So we only have one gap value and then we have a huge frequency. For the fully connected spin class systems we have this distribution for the hamming distance. We learned from this that it's easier to solve these problems because here the hamming distance is picked around five. So in principle, we only have to make or one could say on average, one has to make these five spin flips. And also the distribution of the gaps looks differently. Then for the very easy spin class model to solve, we have this distribution of the hamming distance and this distribution of the gap values. Here you also see that we have a lot of gaps that are very small, which with a huge frequency. So then we have used the DVEF Advantage system to solve these problems. And we looked at the success probability as a function of the number of variables. And here we looked into problems with up to 180 variables. And then we get this group for the success probability. If we look at the curve for the fully connected spin class problem, then we see that for hundreds of variables, for problems up to hundreds of variables, we have this shape. And we made some fitting and then we can compare here the exponents. So here we find 0,090 and here we find 0,163. So from this, we can understand and from these distributions here, that's all we see that there is a correlation between this exponent and these distributions. So this problem is harder to solve than this one on a quantum annealer. Then if we go to this two set problem with the property that it has a unique ground state and this highly degenerate first excited state from which we know already that it's hard to solve on annealer, we get this curve for the success probability. And as you see here, we can only go to problems with up to 20 variables. But you see that this exponent changed to almost 300. So from this, we see that there is a clear correlation and this means, of course, this is for small problems, but we have a nice scaling here. So we believe that even if we make these problems larger, we can predict how the annealer will behave for these problems. So that was the first application for the more, I would say theoretical type of optimization problems. Let me now come to a potential application, which is called the tail assignment problem. This problem is a problem that comes from the airline industry and when airlines industries see their daily problem that they have about 1,000 flights per day to more than 150 cities and in more than 70 countries. And of course, in the fleet, they have hundreds of aircraft and they are of different types. So then the question in the tail assignment problem is that one has to come up with a mathematical optimization model that when solved can provide airlines with efficient plans for how to use their air conditioning. And the most costly items are, of course, the costs that are associated with the aircraft itself, but also with the flight crew. So that are the most significant costs which one wants to minimize. These tail assignment problems for the airline industry are complex problems, they are also very big. So this means that we cannot solve them with the quantum annealers and quantum computers that we have nowadays. So this means we have to simplify the serial world problems. And this one can do, for example, by finding routes to carry out to give a number of flights between the airports so that at least the routes do not overlap. And the first step in bringing the tail assignment problem to the annealers or also to the quantum computers is we make a mathematical formulation. And actually for the tail assignment problem this corresponds to a linear assignment problem. What we have to do is we have to minimize this expression which is subject to this type of constraint. Now this first expression here is encoding the costs of assigning aircraft to routes. So the factor F contains cost models per route and the X are the routes. Then here we have a constraint matrix A and this expression is therefore encoding the constraints that one has. In this case we want to have one aircraft for one route. In this constraint matrix we have elements that take values zero or one and they simply indicates whether a flight is part of a route or not. We then have this mathematical formulation which we want to translate into an easing problem. So we go from our tube to an easing problem by using this transformation where the X were taking values zero and one. The S now takes values plus and minus and this expression here. Now here I want to make also the remark that when this value lambda takes the value zero then we have an exact covered problem and then the objective of the whole problem is to find any feasible solution not necessarily the optimal solution. And this is something which is very often of interest in practical problems because there we are not always looking for the optimal solution but any solution that already reduces the costs is a valuable solution. Here I also want to mention that this type of research as I said before is close to a real world problem and it's resulted from a collaboration with Boeing in the open super tube project which is one of the projects in the European quantum flagship which had the goal to build a superconducting quantum computer. So when it's in this project that we looked into this type of problems. In principle for gate-based quantum computers but we also solved it on the database quantum annealer for cross-platform benchmarking. So we looked at these reduced problem instances which are exact covered problems and the problems we looked at were a series of realistic problem instances which were obtained by random sampling from real world data with up to 40 routes. Now this random sampling from real world data is just a step in between because the company Boeing did not and this is understandable did not want us to have a direct connection between the problems we are looking at and their real world problems. But for the hardness to solve the problems that's exactly the same. Doesn't matter too much. So I will now discuss one example. We have 40 routes and each of each contains several out of 472 flights. The question that was asked here is to find routes to carry out all these 472 flights between the airports so that the routes do not overlap. And we studied the problem with 40 routes corresponding to a 40 qubit problem. Now we looked at the problem instance with one unique ground state which is then the solution of the optimization problem. Here you see the ground state which has nine qubits which have the value one. So we have a solution with nine routes and then each route is assigned to an aircraft and all other states that are found represent then invalid solutions which means that these 470 flights are not covered exactly once which means in practice that these would be solutions that have a higher cost for the airline company. So to give you an idea of how big this problem is we made a small visualization. So here you see the airports that we took into consideration distributed over the whole route. Then we have here the network of 472 flights between the airports that have to be carried out and here you see the solution with nine routes that covered these 472 flights exactly once. So this is just to indicate to you that this is not a problem that one simply solves by hand but that it's already a little bit more complicated. To give you further idea about the size of the problem and what the annealer has to do or what the solver has to do is if we have to find this optimal flight schedule so that each flight is covered exactly once for 40 routes we have 40 qubits. So we have here 10 to the power 12 possible selections. If we make our problem bigger if we scale it up to 120 routes corresponding to 120 qubit problem then we have 10 to the power 36 possible selections. So it scales fairly rapidly. Now we have translated this problem making use of this constraint matrix and this is nothing else than this table here where the rows in the matrix represents the routes. So we have here a matrix with 40 routes and then we have as columns the flights and what is indicated in blue here are the routes that are parts of the optimal solutions. So this is this matrix with the zeros and the ones and we have to see that the sum here of the rows is exactly everywhere of one. Here I show you the coupler graph and each non-zero value for J corresponds to a black line between the 40 qubits. So you see that the problem we are solving here is an almost fully connected problem and this fully or almost fully connected problem needs them to be embedded on this DBF architecture which is far from fully connected. And here to just give you an ID, I also give you the distributions of the values of the using parameters for this problem. So we solved first problems with 30 to 40 qubits where we have 90% of the couplers that have values that are non-zero. What we did, we made a scan of 10 different embeddings and 20 relative chain strings. We used the 2002 with the Schumeric graph to solve these problems. And as a comparison, we also used the advantage system with the Pegasus architecture. And we or I show you here results for 30, 36 and 40 qubits. What you see here is the success rate as a function of the relative chain strength. And what you see is that the for 30 qubits, of course the success rate is quite high. For 36 qubits, this reduces already strongly on the Schumeric system less on the Pegasus. For 40 qubits, we have a low success rate on the Schumeric graph, but still a very reasonable one on the advantage system. So we then took the best embedding and chain strings and made a scan of the annealing time for the different problems with different qubit sizes. So observation is success rate increases as the annealing time is increased, but for the larger qubit problems, we see a clear advantage on the advantage system which is represented by the blue line. We then also studied larger problems with up to 120 qubits. And these problems are larger, but they are sparser because only 20% of the couplers are non-zero. Then we looked at the fastest successful runs that reproducibly gave a solution so here you have the QPU access time as a function of the number of qubits and the success rate as a number of qubits for the two different systems. And we see that from this picture here, the advantage system as one can expect can solve larger problems than the 2002 system. It also solves the problems faster, but if the 2002 can solve a problem, then we observed that sometimes the success rate is higher than solving it with the advantage system. And this is something which we have observed regularly, also for other types of optimization problems. Now, these are results for the quantum annealer. It's always good and because we were studying this problem in the context of a project that was developing a gate-based system, it's also good to do cross-platform benchmarking with optimization algorithm on a gate-based system. And there we know we have the quantum approximate optimization algorithm which is designed for this. As a word, or this type of quantum algorithm is a variational algorithm and it's also hybrid, meaning it's consisting of two parts, one part which is a quantum algorithm. And in this case, the quantum algorithm has to evaluate the energy expectation value of a quantum register. And the second part of the algorithm is a classical one and this is an optimization algorithm which has to optimize parameters for unitary transformations, parametrized unitary transformations that are used in this quantum part of the algorithm. Now, because gate-based quantum computers suffer from a lot of errors and because they are still relatively small, for the comparison here, we did not use a real device, but we used an emulator which runs on a supercomputer. And for this, we used a GPU version of the Ulysses Universal Quantum Computer Simulator. Now we took the quantum algorithm and let it carry out by the GPUs of the Joules Booster. So in this case, the GPUs then play the role of the quantum processing units. The classical optimization algorithm we gave to the CPUs of the Joules Booster and the problem we studied was the tail assignment problem, the simplified version, which I have been talking about before. And here we can only look into the problem with 40 qubits. Why? Because the combination of this emulator on the Joules system limits us to calculate problems with up to 43 qubits. Here you see the results, the success rate as the number of qubits. We can calculate the success rate because for these problems, we know the exact ground state. And the blue triangles are the results of the quantum approximate optimization algorithm, the standard version as you might all know. What we see here is an exponential decrease of the success rate as a function of the number of qubits, which is a bad message, because what we have been doing here is solving this problem with what one could say an ideal digital GPU because we solved with the help of the quantum computer emulator on the Joules system, this optimization problem. Now we were not satisfied with these results. And as we are also looking into algorithm development, we asked ourselves the questions, can we improve this? And we then came up with an approximate quantum annealing algorithm, which is a new algorithm, which is better than Kaioa, at least for this problem set, in the sense that we got the rate of this exponential decrease. You see the success rate is still low, but at least we got rid of the exponential decrease. And then for your information, I also give once more for comparison, the results which we obtained with the annealer. So for the 2002 system, we found a success rate of 10% for these problems. When we put the identical problems now on the annealer, and for the advantage system, we found 70% success rate. I also want you to show you some results about classification. And for this, we have put a version of the support vector machine on the quantum annealer. So we went from classical support vector machine to a quantum support vector machine. And a support vector machine is a supervised machine learning methods for binary classification. So one starts from a different training sets which contains feature vectors and labels. And the support vector machine is then trained by solving the quadratic programming model where the alphas are continuous program variables. And we also have a kernel K here, which depends on these data features. And then we have also some constraints here. Now, on the diva annealer, we solve Kube problems. These are also quadratic problems, but there we don't have continuous variables, but we have binary variables. So this means we have to make an encoding from or two binary variables from the alphas to the binary variables. And for this, we use this expression with the base and the exponent. We then come to a new energy expression for the cost evaluation. So we have here our familiar Kube formulation if we make this translation. Now, as a first application, we used a toy model with toy data. The points or the toy model is the following. We have a set of points, a set of 40 points. And the question is to make a distinction between the inner points and the outer points. Now, here you see the solution with the classical support vector machine. We get this black line here. So these red points are in distance close to each other. These blue ones are further away from each other. So property of the classical support vector machine is that one gets the global minimum, but this is only guaranteed for the training data. Then we put the same problem on the quantum annealer and then we get several solutions. And here you see three of the solutions that we get by running or by let the annealer solve the problem. This first solution that resembles very much the one of the classical support vector machine. These ones are different and they take into account different properties of these data. So this one, for example, takes more into account the distances between the points, the outer points. So we can say what we have here with the quantum support vector machine, we have additional higher energy classifiers from an ensemble of solutions. Now, what we then do is we combine these classifiers and this gives them, or what we have seen is that this can give better results. Why can this give better results? That it's because it's taking more properties of the data sets into account. We then have applied this to a problem of computational biology. And for this, we looked into the problem which is described in this paper here. This was a classification problem and the question was whether transcription factors of protein bind to DNA or not. And here you see results which we obtained by classifier with the quantum support vector machine here with the classical support vector machine. And what we observed is that, and this is typical with the quantum support vector machine, we have an ensemble of solutions at one shot. The combined classifier, so if we take several of those, generalizes well to unseen data, so not the training data. The issue here is that we can only do this up to now, of course, only for small data sets. So the problem size is still small. And we must say that if we look into other problems or other applications, it's also so that combining these classifiers does not guarantee that one comes to a better result. For this application, yes, for other applications, not always. So this is not guaranteed. Chris, sorry, I think we should move towards the conclusion or feel it. Yes, so as a conclusion, I want to show this last application, which is a classification of remote sensing data. What you see here is a picture which shows part of the city of Lyon. It's a false color picture. We want to make binary classification. So here you see the ground truth, red structures are the buildings, the black are all the rest. Here you see the classical support vector machine results and the quantum support vector machine results. So in this case, you can also see there is quite some good achievements between both pictures. We can even say that in this particular case, the quantum version gives a better version than the classical one. And what we are doing right now, and that's work in progress, is to do a multi-class classification of this type of earth observation pictures. And with this, I would like to conclude our research overview. Thank you very much. Chris, time for a few questions. Hi, I actually have a question. Yes, please. At the start of your talk, you said that when you were talking about this having distance between the spins, you say that you flip a few spins to change between chiralities. What do you mean by chirality? I was not talking about chirality here. So what I was talking about, if you have your states, then they are represented by these bit strings. And you can calculate what the having distances is between two states if you do these spin flips. So I was not talking about chirality. Okay. I also have a question about the effect of this relative change strength and embedding. I mean, how does this different embeddings and what is exactly this relative change strength and how does it affect the embedding? We have certain embeddings and then we can look at the, and therefore it's better that I share again to give you the expression for the relative change strength. So I can also do it like this, it's faster. So here you have this relative change strength. So it's the chain strength, which is divided by the maximum of the absolute values of the values H and Nj. And one actually has to play with, so what we do here, we have different embeddings and then one plays with the chain strings for the problems and then you see that you have different success rates. So this chain, relative change strength is also a parameter just as the annealing time is to solve your problems. Thank you. That's kind of good. Hi, yeah. Could you go back to slide six, please? Here's the example. Yeah, so okay, we were on that slide, number six. Yes, I wanted to make it full screen because this might be. So the question I had about it is simply, did you optimize the anneal time or did you change the anneal time even in order to deduce the exponents that you were showing there? The exponents are probably pretty sensitive to optimizing the anneal time. Yes, as I would say, as always is the case. So the success rates are a function of these parameters anneal time and also this relative change strength. So what we do here is we look at best values. So we make a test and then we look at best values. But I do unfortunately not know by heart this, now we are here, yes. So I would have to look either, I do not know by heart what the annealing time was to come to this success probabilities. But we take long enough annealing times and relative change strength. So we first make some optimal there. Okay, then I had a general question about many of the results that you presented. In some cases you found these impressive success probabilities, for example, in the tail assignment problem. But quite generally, did you try to benchmark your quantum results, whether on D-Wave or the simulation of QAOA versus the best in class classical solvers for the same problems? For example, for a tail assignment, I imagine that airlines have invested a lot into really good classical heuristics. And so how do the results compare to best in class classical solvers? These are problems, at least the size of problems we are looking at that can be easily solved on supercomputers, even if this, so that's no competition. So, and that's also not the goal of the work that we were doing here. So that's for us, for this type of problems we were looking at here, that's a given. So our question was, can we put these problems onto these quantum computers? And what does it mean? And what does it mean if we do a cross-platform benchmarking between different type of quantum computers? So it's not that we claim here any so far quantum advantage in looking into these problem sizes. Okay, thanks. Okay, I think we have to move. So let's thank Crystal again for your call.