 OK, thank you very much. It's nice to virtually be in Trieste right now. And so I'm going to talk about the Quantum Utility Highway. We're running out of adjectives here, but this is joint work with my colleague at D-Wave Power and so what we mean by Quantum Utility is the true cost of using quantum and accessing quantum computation as experienced by a user in typical use cases. And as everyone knows, quantum annealing works by the declarative paradigm. So instead of implementing an algorithm to solve your problem, what you do instead, the algorithm's already implemented and you transform your problem, translate your problem into a formulation that is compatible with the already implemented solver. And that translation creates overhead. There's computation time required to do that. And so that's the type of overhead we're looking at. There's also some overhead costs due to layering. And we want to know if it would be cost effective to swap out the quantum solver and replace it with a classical alternative at different levels, at different layers of these overhead costs. So in particular, in the long one, we want to show that even when you consider these overheads, the quantum solution is performing classical alternatives. And on the way to getting to that place, we can work outwards with increasing overheads and establish early milestones. So today, I'm not going to talk about milestone zero and nealtime, pure nealtime with no overheads that will kind of follow from the next two milestones. But I'm going to talk about considering the overhead of chip access, which is programming and readout time, basically, and the delays that are built into that. And the overhead, the additional overhead that is the indirect cost of minor embedding that logical solvers can take a logical problem, an original problem of size n. It has to be minor embedded onto the chip, and that creates a larger problem of size q. And the question is, what's the penalty for having to increase your problem size? And there are more milestones ahead, but I'm going to talk about these two today. So for milestone one, talking about chip access, this is just a cartoon that sort of shows generally how things go. But you see on the bottom here the quantum annealing, if you want to sample 200 solutions in 100 microseconds, I'm sorry, 100 milliseconds, a large part of your time is spent with this line indicates the programming time. It really dominates the computation, the chip programming time. The actual anneal time is about half the width of one of these dots here as I've drawn it. And so the anneal time is very fast compared to the total cost of computation. And you can get quite a few samples in 100 milliseconds. You can get about 200 in this scenario that I drew. But a lot of that time is just simply chip programming time. Whereas the teal dots at the top are something like a greedy algorithm that is just falling down to a local minimum as quickly as possible when starting over. And it produces solutions from some probability. It's fast. And it produces solutions from some probability distribution, but they're not necessarily optimal, certainly. Solvers or algorithms, heuristics like simulated annealing have a parameter, the number of sweeps. And you can actually adjust that. If you're only interested in getting one solution, it's usually most cost effective to just set the thing to anneal very slowly to have a huge number of sweeps. And you could imagine that if you extend this even longer and have more time, that it might converge to even better solutions. But on the other hand, if you want to use simulated annealing to get 10 solutions within the same time period, the anneals have to be shorter. And you're going to get worse quality solutions, basically. So there's a trade off there in terms of numbers of sweeps and numbers of time. So this is the scenario in which we're going to look at this performance. The QPU time is going to include what we call access time programming plus readout. In this test, all of the solvers will be reading physical inputs that are directly on the Pegasus graph. And we're looking for the best median solution in some sample size S in some time limit T for various values of S and T. And for milestone two, in addition to the access time and the scenario I just considered, we're also going to consider the indirect cost of minor embedding. And that basically means that some classical solvers can they're able to read an input of arbitrary structure. And that has n variables. But before the quantum processor can solve it, it has to be mapped onto the Pegasus graph in this case. And that adds variables. And so it's reading a much larger input. And we're kind of looking at the cost involved there. Now we all know that with this lower bound on programming time that when n is very small, classical solvers, they can run faster, slow on different inputs. But when n is very small, they have nanosecond scale instruction sets. And the classical solver can just find optimal solutions while the quantum system is still programming. And so we're looking for more or less n being large enough that there's a chance that the quantum processor can outperform classical. So we used one experiment to study both of these questions. Here's how it works. We have one quantum processor. It's an advantage processor that's currently online. Looked at four classical solvers that read arbitrary graphs, arbitrary inputs, and two physical solvers that read chimera structured inputs only. And they're implemented on GPUs. That being said, all solvers can read all of the inputs, pre-embedded or post-embedded. Or if they're native, then that's the same thing. We assume a typical use case for heuristic optimization All the parameters are fixed or auto-tuned. The users are not interested in spending more time tuning than they are actually just solving the problem. And we generated 25 inputs each from 13 different input classes categories. Five of them are native and eight of them are logical input classes. And in all cases, we are looking at the largest n that fits on the current chip according to a specific embedding policy. So we set up these tests for a variety of runtimes for 20 milliseconds up to one second and for four sample sizes, one through 1,000 samples. That gives you 24 combinations of TNS. And five of them, which are very short runtimes and very high sample sizes, we omit. They can't be served by the quantum processor because of the overhead cost, basically. So that's the test we're running. And then what we're looking at is the best solution found, or rather the best sample median found in time T, which in the case when the sample is of size one, that's simply the best solution found. So for the inputs and the solvers, here's a list of the inputs we used. Five of them are native, mapped directly onto the Pegasus graph. And eight of them are logical problems that are motivated by real world applications. We have three physical solvers. One of them is quantum, and two of them are simulated annealing implemented on GPUs. One of them has its parameters set more for optimization, and one has parameters set more for sampling. So we could cover both scenarios. And then we have four classical logical solvers that in all cases either have default parameter settings or some auto tuning is used to determine the number of sweeps and the ones that can be controlled that way. So just looking at the output, this is like a result of a little mini test using three inputs across the columns you see increasing sample sizes. And all of the little graphs here are for a half second runtimes. And what we did for each input class is compute the median relative energy of the sample. And the x-axis here is the 25 inputs in rank order. So this is an empirical cumulative distribution plot. And I've put a blue dot in every one of these panels where the blue curve, the quantum annealing, the QPU curve is strictly below all of the solution qualities for all of the other solvers. And so in particular for s equals 1 and t equals 0.5, out of these three tests, the QPU wins in two of the tests. This is called a horse race analysis. You have winners and ties, and you're looking for just counting out how many wins you get. So in this case, when s is 1,000 and t is half a second, the quantum annealer wins all three times. And the other thing we're looking at is classical fails. And that happens when, for example, when s is 1,000 and t equals half a second, you notice a lot of the lines are missing in these graphs. And that's because the classical solvers simply can't return 1,000 seconds, 1,000 samples and a half a second. They're reaching their own lower bounds and just can't fulfill the task. And so we're looking at those two questions going forward here. So the results for milestone one, if you consider only the access time, which is annealtime, programming, and readout, that's about 100 times slower than the pure annealtime. Then how many of the 13 input classes does the QPU win or tie on? And these are cases where the classical, the logical programmers problems are only reading the native. The logical solvers are only reading the native inputs, and the native solvers are reading all of the physical inputs. So they're all reading the same physical inputs. And you can see that the quantum annealer is always winning nearly. There's one case with small inputs that the quantum annealer didn't win in. But out of these 13 cases, it's beating all of the classical solvers in nearly all cases. So we're going to call that a milestone past, basically. And then for milestone two, if you consider only the logical inputs and the logical solvers working on the smaller problem, whereas the quantum processor is working on the larger problem, we can see the quantum processor isn't winning all the time. But it still can win in some cases. And in particular, it's doing best in this upper range where the runtimes are small and the sample sizes are large. And those are the places where it's finding some success. And if we look at classical fails, in each of these boxes, there's six solvers. And you ask how many of them didn't qualify, couldn't solve the problem in each of the 15 cases. And the answer in this case up in the upper right-hand corner is 2.8 solvers, on average, couldn't fulfill the task. And you can see the cases where the classical solvers are having trouble is the same cases where the quantum solver is doing well. So if we look at an explanation for this behavior and what is it about these inputs that make the problems easy for the quantum processor and hard for the classical solvers, basically the single most important indicator is what is the solution, what is n? What is the number of how many variables does the problem have? And you can see here in our two extreme cases where we have a clique of size 175 variables versus a native problem, which has 5,000 variables, you can see the difference pretty clearly here is that blue is the QPU, green is simulated kneeling, and red is the steepest greedy descent. And then we have random solutions up here for just a line in the sand. And when the problems are small, just 175 variables, the greedy solver and simulated kneeling are finding what we can assume are optimal solutions because of the consensus very quickly. In the gray band here, the gray region is the operational region for the quantum kneeler after some programming time. And there's a time limit of one second imposed on users, online users. And so this is from taking one sample up to taking a half second, basically, run times. And in that half second, because the problems are so small, classical really has no problem just converging very quickly before a quantum kneeler even gets started because of its overheads. Whereas the problem gets 30 times bigger, and the classical solvers, of course, their run times increase with n exponentially. But we're keeping the run times the same, and the sample size is the same. And they can't do the job in that comparatively short run time. Whereas the quantum kneeler is doing fine on these really large problems. It does better on the large sparse problems than it does on the small dense problems because of chains and chain lengths. But the real point is it's doing better on large inputs than it does on small inputs. And I'll say the first time I ever did any benchmarking work on a quantum kneeler was the D-Wave II a decade ago. And these were the same run times we were looking at that. So it only had 500 qubits. And it got mixed results against classical. But the run time of the quantum processor, the effective run time for users, hasn't changed in five generations in 10 years. And whereas classical solvers have just been struggling more and more to keep up with that kind of run time. So that bodes well, we think, for the future systems. Milestone I is really about the past and where we are right now, we can see that the QPU does just fine against classical if you're considering these overheads of programming and readout. In terms of that balance between large N, the problem size and the logical problem size and Q, the embedded problem size, it's not so much about the difference. The ratio between N and Q is the fact that N, when it's a very small dense problem, is still so close to the programming time that classical is eating these problems for lunch. And so we expect that as the processor gets bigger and the quantum overhead costs pretty much stay the same and the quantum solution quality, it has improved every generation in terms of the fidelity and the noise suppression. We expect that basically the inevitable exponential overhead cost of classical with increasing N and fixed T is going to lead to more classical fails and more quantum wins. And that's it. Are there any questions? Thank you, Catherine. OK, thank you, Pika. So questions, let's check the chart, if there are. I've got a question, actually. So for on slide 12, I didn't see anything really converging to the zero relative error on that bottom plot. So how did you compute your upper bound? Was it with some sort of branch and bound method? Did you just run these for a very long time? It's hard to, yeah, you can't assume much about what is optimal here. We only looked at relative error compared to the best one in the pool. And so this is just another test with different parameters produced a zero point here. So we're really channeling the typical user experience where classical heuristics are usually evaluated assumes that there's some sort of a time limit imposed upon the user, and they need a solution at some fixed time. They don't necessarily expect to find optimal solutions. And in this case, we can kind of assume that if you're seeing a lot of solutions all landing in the same place, it's probably optimal. But we can't assume anything about the native inputs here. Maybe the quantum solutions, no, there must have been some other solutions that were found outside of this plot that would give us the zero point. Is that answering your question? Yeah, I believe so. I think looking close, I can actually see like a blue cross down on that line, actually. Oh, yeah, yeah. I think you're right. Right. Good eye. Thank you. And in general, this was part of a pool of solutions that were returned by all of those experiments. And so it could have come up in another scenario on this input. OK, other questions? Thank you for the nice talk. So I guess it's kind of just a general question as to what do you see as some of the barriers or to kind of wider application use for these type of algorithms? Is it just kind of still education, or is it just kind of getting bigger test beds, or better integration with the? Well, I think as the processor grows, part of it, there are a lot of application problems that are really, we did try to really go for application inspired logical structures. But there are problems with constraints that are just basically not large enough yet. We can embed large enough problems, given the extra number of qubits needed to represent constraints. So in terms of broadening the scope, yeah, I think larger inputs. And that's kind of why we have a vigorous hybrid computation support right now. While waiting for the process to get bigger, we have some hybrid solvers that decompose large problems and put the set queries to the quantum processor to sort of guide the search for large problems. But to my mind, it's mostly about getting bigger so that a greater variety of problems can be represented. OK. Other questions? Still. Very nice talk. Thanks. I just had a question of whether you tried on this slide parallel tempering as well. Yeah, that's what the PT is, parallel tempering. We had two versions of it that parallel tempering has a lot of parameters. And we tried them two different ways, basically. OK. So I was asking on the last slide whether you had also results for parallel tempering, because I only saw. Oh, no. I would say generally, yeah, it started getting too cluttered. So I guess we just stopped. But parallel tempering has more loops and more parameters. And it generally starts later than simulated annealing. It has its own quite considerable overheads. And both of those parallel tempering codes were the ones most likely to fail when the times got small and the sample sizes got large. So it's quite a complex algorithm. It's just not fast enough for these very fast scenarios. Other questions? OK, so let's thank Catherine again. And we move to another online talk.