 Hi everyone, I'm David McKay from IBM Research and today I'm going to talk about benchmarking near-term quantum computers. Before getting into our processes and protocols for benchmarking, I think it's important to motivate and contextualize by mentioning some of the recent advances that have occurred in near-term devices. One of those advances has been the move towards systems and devices on a cloud platform. What this has really allowed is standardized access points to some of these devices. For example, at IBM, we have our devices on our IBM quantum cloud, but then we also have a software stack for accessing those devices that includes standard interfaces to the devices for controlling the devices with gates through our OpenCasin protocol and through pulses via our OpenPulse protocol. What this allows from a benchmarking perspective is to start to think about standardized benchmarking protocols that can be applied across a number of different devices and a number of different device technologies even. As part of our QuizKit software, the QuizKit Ignis package is our characterization component. The other advance to mention is the gradual evolution towards larger devices. When we first put devices on the cloud in 2016, we started with five qubit devices and steadily that number has increased over time from 16 to 20 and now even 53 qubit devices. This has created a particular new challenge for benchmarking protocols. We visualize our benchmarking efforts through this pyramid structure where at the bottom we have things like the device specification. There we have a lot of information that we gather about the device, but it's very individualized, things like the coherence properties of the individual qubits, the frequencies, things that are mainly static about the device and set mainly through fabrication and device design. As we move up this pyramid, the benchmarking information involves more and more qubits and becomes more complex but tells us more about the holistic nature of the device. Moving up from our device specification, we enter what we call subsystem benchmarking and this is things like randomized benchmarking. Finally, we go to holistic benchmarks which only give us a few numbers characterizing the device, but they characterize the holistic performance of the device, the algorithmic power of the device. Each layer in this pyramid represents a stopping point where we have to hit specific success metrics to continue further up the pyramid. This talk could, of course, really be all about this pyramid and all the individual components. I'm just going to very briefly mention what subsystem benchmarking is and then in close talk about some of our new results in holistic benchmarking. Kind of an overview, how do we get operational information about our device through the subsystem benchmarking? Our goal, as I just said, we want to get optimal information that has some predictive power, but we can obtain that information rather quickly so we can continually update it. IBM, much like a number of other groups, has really leaned on this randomized benchmarking protocol because it's relatively quick and it's not sensitive to measurement and preparation errors. As a reminder for randomized benchmarking, what we do is we select a string of random Clifford gates here, L, then we calculate the inversion gate, and that can be done efficiently because these are Cliffords, then starting in the zero state we apply this sequence to the set of n qubits, then we measure the polarization after the application of that sequence. You can see for each individual random sequence you get kind of a noisy distribution, but if you average over many random distributions you get a nice exponential fit here and that can be fit to this exponential fit function where the fit coefficient is very simply related to the average gate error. When we say subsystem randomized benchmarking, what we mean is we're going to run randomized benchmarking on a subset of qubits in the device. Typically, we want to run them on the subset of our native gates, so for our superconducting qubit devices we typically have a native gate set of one qubit gate operations and then two qubit gate operations, so we want to run subset randomized benchmarking on all those one qubit and two qubit pairs where they have a gate in the set. In fact, that's the type of information we return from a device when we characterize it, so here's a typical characterization you get from one of the IBM cloud devices. Here at the top are some of the characterizations from that lowest level of the pyramid, the device specifications, but then down here we have the single qubit errors from randomized benchmarking and then all the two qubit errors from randomized benchmarking. Again, for this particular example, qubit 3's single qubit gate error that's done by applying this randomized benchmarking sequence and then fitting a curve like this. This doesn't give the full process information like a process map would. This is still a very rich set of information from the device and you can map this distribution of errors across the device and really get a feel for device performance. Here's the 53 qubit device and then a set of the 20 qubit devices and what we can really see even from this simple randomized benchmarking metric, you can clearly see the improvement over time of the error distributions and the mean errors, but one of the questions that arises from these subsystem metrics, how really operational are they? They're a very good starting point for trying to make predictions about the device, but it's clear that in some cases the subsystem metrics are not enough to predict full algorithmic power and that's why we move up one more on the pyramid to look at holistic benchmarks. There's been a number of holistic benchmarks proposed and demonstrated. For one, we can just take randomized benchmarking and instead of doing subset benchmarking, benchmark the whole device, we can look across entropy metrics, algorithmic metrics so people have looked at VQE as a metric for the device so if you do VQE on a known Hamiltonian, entanglement metrics are also popular. For the end of the talk here, I'd like to talk about quantum volume which is our holistic metric developed here at IBM. What is quantum volume? Well, quantum volume is a procedure to try and measure the largest effective square circuit that you can run. The algorithm itself is that we split the algorithm into these layers here and each layer consists of a completely random SU4 gate which means a completely general and random two-cubic gate between a random set of two qubits, irrespective of the topology of the device. So represented here, these are random permutations but you can think of this as kind of a random SU4 between a random pair of qubits. We do that in each of these depth slices, then you're going to get a full n-cubit unitary and then you compile that model circuit to your device based on your device topology. What is the success criterion of this algorithm? Well, you run this experiment, so you've generated some random circuit, you run it, you get a set of bit strings and then you compare that to the classically simulated output which should be perfect and that's the ideal set of bit strings and then you look for outputs which are so-called heavy, which means that those outputs where the probability of that output from occurring is greater than the median probability of an output from occurring. Those are the heavy outputs. Once you identify which outputs are heavy from the ideal distribution, you measure experimentally how many of those did I measure? So what fraction of the bit strings did I measure that were heavy experimentally? And the success criterion is that over two-thirds of what you measure are heavy outputs. And then quantum volume is simply defined as two to the power of the largest qubit subset where you still satisfy this success constraint. What does that look like in the experiment? Here's from four qubits, after you do one depth of the volume algorithm you can see that the white bars which are the ideal distribution and the green bars, they're very heavily overlapped. Here the experiment had a heavy output of 0.8 but then by a depth four these systems really depolarized. The experimental distribution is basically flat and the heavy output is only 0.5. So this is kind of showing how the errors have destroyed your ability to do that unitary. And to close the talk I'd like to mention our most recent results demonstrating a quantum volume of 32 on one of our new devices. And this work was done mostly by Peter Jurchiewicz and should be in a forthcoming publication. This is our Raleigh device. It's a 28 qubit heavy hex device. Here are some typical median and maximum coherence numbers. So median coherence of 100 microseconds, maximum of 250. Here are some of those subsystem benchmarks from the device. Two qubit, one qubit errors. You can see this is, sorry, backing up. This is the set of five qubits that we're going to demonstrate the device on. It's a linear, it's a chain of five qubits. This is the properties of that chain and you can see that the errors are quite good. Some less than 1%. Fairly low ZZ interactions between the qubits. And here's a crosstalk map of that chain as well. And the crosstalk measure is if I apply a drive to this qubit, what percentage of that drive ends up on the other qubit and you can see that they're predominantly lower than 1%. And then if we look at the actual quantum volume numbers, well here's some prior results on quantum volume from our other devices. You can see only on this IBM Q-System 1, which is now Johannesburg, you can get a, we obtain a quantum volume of two to the four, so 16. And then now our most recent results on this Raleigh device, we can very clearly with some optimizing over our gates, we can get a very clear signature of quantum volume 32. And so I'd like to close on that and just mention if you want to try quantum volume on your own device, please install a quiz kit and check out the quiz kit Ignis package. That's where you can, there's code in there to generate volume, run volume and fit those, fit volume so you can make graphs just like this. All right, thank you.