 I'd like to start off with thanking the organisers for inviting me to speak here today and also to thank them for putting on such a fascinating programme. It's been a really fantastic meeting so far. So, this is not my first slide. Tech, hello. Okay, so, I want to talk about classical simulation of quantum computers with few non-clifid gates. So, a little bit of motivation if it's needed to motivate why we're interested in classically simulating quantum computers. So, the theme of this meeting is, of course, the quantum classical frontier and when we're going to reach it. So, many of the talks throughout this meeting have been focused on trying to bring that frontier closer and closer or sooner and sooner to now by basically coming up with new interesting ideas for how we might be able to demonstrate that something is... some quantum device is superior to a classical device. But we haven't really done that if we haven't also done a thorough job in thinking about what the best possible classical algorithms are that we could come up with. So, that's what the talk is about. But also, there are other uses for quantum simulation methods. So, verifying and validating that hardware and software do what you think they do because bugs do crop up in software and also hardware. So, you need some way of checking or double checking that what you've got is correct. And then also this question of benchmarking performance to improve design. So, maybe you have two different methods of doing something and you want to know which one is best. And as we saw yesterday with this kind of trotter Suzuki decomposition there can be different ways to do this and whilst the lower bounds tell you that you need a certain number of gates to achieve a certain accuracy, in actual fact when you look at the kind of numerical analysis what you find is you need a much smaller number of gates. So, if we can get better and better tools for simulating larger and larger quantum systems then there's some hope that we might be able to better benchmark the performance and not rely on analytic bounds that are pessimistic. Okay, so I'm going to kind of broadly classify all simulators into three categories. The first one is the brute force attempt where you just write down the whole wave function and this will get you up to say 30, 40 qubits depending on what your hardware is like. It's going to be exponential in the number of qubits but probably no mule in the number of gates that you're going to apply. So, the second category which we've heard quite a bit about in this meeting is kind of tensor network approaches. So, here you have an overhead that's going to scale more polynomially in the number of qubits so IBM have a recent paper with a simulator 56 qubit device but it's going to be exponential in the depth and it's more suited to circuits where the depth can be understood because there's actually a number of... the layers are explicitly one or 2D or have some kind of locality structure. But what I'm going to focus on throughout this whole talk is another approach which I'll call stabiliser simulation. Where again you kind of respect this huge beasty Hilbert space the huge enormous and exponential Hilbert space and try to do things in a way that's polynomial in the number of qubits but has some exponential overhead. So, the exponential overhead is going to be in the number of non-clifford gates so T gates or Toffley gates but you're going to be able to handle a polynomial number of clifford gates so control not, Had am Mars or the S phase gate and also you're not so constrained by locality so you don't have to worry about the kind of locality constraints that you might have with tensor networks. So that's going to be the different types of simulators that I'm going to talk about today and I'm going to give you more of a kind of overview talk. I'm going to mention some old ideas, some newer ideas, some ideas by other people and I won't go into any huge amount of detail for each one of them just kind of give you an overview of what the different kinds of stabiliser simulators are out there at the moment. So the earliest work on this demonstrated that it is in principle possible is this paper by Arincin and Gottman which showed that, for example, if you have N T gates then you can simulate this with a runtime at scales like 4 to power of N. So this is a really, really quickly growing exponential so you're not going to get to a circuit with very many T gates doing this but all they really wanted to do was a proof of principle that you could have something that has these kinds of scaling properties on the right-hand side. What we're interested here is trying to basically tame this exponential down to a more gentle, gentle exponential so there's some hope that we might be able to get to a larger, larger system sizes. So this is the structure of my talk so I'll give you some background material on what I would call the magic state model of computation or the stabiliser model of computation and then three sections that describe three different types of simulators. So the first kind of simulators are going to be quasi-probability or Monte Carlo simulators and the next two are kind of similar you might group them together even because they're both going to be stabiliser rank simulators. The thing that differentiates them is that one is going to be exact and the other one's going to be approximate. So here are some references for these different sections. So the quasi-probability simulator work is mostly based on this paper by myself and Mark Howard, also building what other people have done in quasi-probability simulation and the ideas behind stabiliser rank simulators go back to two IBM papers. So one for exact simulation and one for approximate and I'm also going to tell you today about some new work which is in progress slash in preparation which is this larger collaboration of people here which includes myself, Sergei, David Gotet, Mark Howard, Dan Brown, who's at UCL and Dan Brown's student. So let's cover the background material. So before we define the magic state model of computation we've got to say what the stabiliser model of computation is. So this is a model of computation where you can prepare stabiliser states, unitaries, you can measure powerly operators and you can perform classical feedback forward and adaptivity. So the stabiliser states include things just like the computational basis state 01. The clifford operations include control not, Hadamard, S gate, so you can generate entanglement, superposition, all of these things. But such circuits can be simulated using the celebrated Gutsman-Nill theorem or method. So you have some polynomial scaling in the amount of time and bits that you need to simulate such a circuit. So to make this universal you can add essentially almost anything that isn't clifford and get a universal gate set. So a kind of go-to thing is the pi over 8 or T gate. But you don't actually have to have the gate. You can instead have this thing here which is the T magic state. So you can have this instead because this circuit on the right hand side is what we call a state injection circuit. So if you have one copy of this T state you can perform a control not, make a measurement. If you get the zero outcome you can get the T gate. If you get the one outcome then you need to apply some other small clifford correction but in both cases once you've done the correction or not you end up with the gate that you wanted. So you can deterministically get the gate from having the state. This is a fairly well-known fact. It's maybe slightly less well-known is that there are other states and gates that do this. There are other ways that you can deterministically inject a unitary that's not clifford. So here is an example which is the control control Z gate. A minus one if it's in the one-on-one state. This is closely related to the Toffley gate. So it crops up in lots of reversible logic circuits. We can associate with this gate a CCZ magic state which is just the CCZ unitary applied to the plus plus plus state. Then we can deterministically use this magic state to inject the gate. Again we do a sequence of control nots and measurements and we have to apply some clifford correction depending on what the measurement signatures are. More generally we can characterise the whole family of unitaries and magic states that we can use as resources to deterministically inject unitaries. If we've got a unitary which is diagonal and we conjugate it so u, powerly, u dagger and we end up with a clifford for every single powerly that we've tried to do this with then we say that this unitary belongs in the third level of the clifford hierarchy and it means that this circuit here can be used to deterministically inject the unitary u. The reason for this, the connection between the maths and the circuit is that these clifford corrections correspond to just the cliffords that you get when you perform this conjugation operation. So in this talk we'll call such magic states clifford magic states because in addition to being able to use them to deterministically inject some non-clifford unitary we can think of them as being Eigen states of some clifford. So they're stabilised by some clifford unitary much the same way that a stabiliser state could be uniquely defined as an Eigen state of some powerly operators, the clifford magic states are going to be uniquely defined as the Eigen state of some clifford unitaries. So these are going to play a special role in our talk today because they could be used to deterministically inject. So this gives us the magic state model of computation where we have everything that we could had earlier in the stabiliser model of computation as a kind of free or easy thing to do an easily simulatable component and we add some number of components which don't fit inside this formalism which give us magic states and then we ask how hard is it to simulate this system and we ask what is the simulation cost in terms of how many magic states there are or how much magic there is if we think like entanglement theory we're not just interested in how many entangled states we have but how much entanglement there is so there's an analogy here between the resource theory of entanglement and magic. So I'm on the first of my three categories of simulators that I want to talk about today so the first class of these quasi-probability simulators and I'm mostly going to talk about work with Mark Howard. So what is a quasi-probability distribution? So anyone with a physics background that's done quantum optics or thought about superconducting circuits will be familiar with a picture like this which illustrates the vigna function of a system. So a vigna function is a quasi-probability distribution because it integrates to one much like a probability distribution and also if we take marginals that give us the position or the momentum degrees of freedom then we find we also have a positive value function but it's not quite a probability distribution because it can be negative in some places. So that's what a quasi-probability distribution is but we're not interested in continuous variable systems today we're interested in more discrete forms of computation. So it's well known that there's a very good and robust analog of vigna function for queued-it quantum computers so where you have a D-level system that you use to build up your quantum computer and D is an odd number. So in this case you can define a vigna function such that the vigna function is positive whenever you have a stabiliser state and furthermore it's been shown that if you add up all the amounts of places where it's negative you can quantify the negativity or mana of a certain state and this is a good way to measure a kind of non-stabilisanness of a state and also by Pashan et al that if you perform a simulation algorithm what you'll find is that you can simulate a circuit that has some number of these magic states with an overhead that scales with the amount of negativity in the quasi-probability distribution. So that's really interesting but unfortunately there's some mathematical curiosities which means that it doesn't translate over nicely to the qubit setting. So in the qubit setting if you try to define a vigna function that works in such a way that it's positive for all stabiliser states it simply doesn't. There's always a stabiliser state such that the distribution looks negative. So we wanted to come up with some other idea of what a quasi-probability distribution can be for qubit's magic state model of computation and this is what we came up with. We said let's just write our state down as a density matrix, as a sum of density matrices which are stabiliser states with some probability in front of them but now let's think about this as being a quasi-probability and it turns out that all states even if they're not stabiliser states can be written in this form as a possibility that the probabilities can become negative. So we can always write a state in this decomposition and then we can quantify how far away it is from being a stabiliser state by taking these sum of the absolute values of this value p. This gives us a good measure which we call the robustness of magic of a state. So it's well behaved, it satisfies all the usual criteria that you would want to have if you were building a resource theory of magic but the one really important criteria today is that it's operationally meaningful in some sense. So what I mean by operationally meaningful is that this quantity, the robustness tells you how hard it is to classically simulate a circuit that has... how hard it is to classically simulate a circuit using a specific algorithm and the runtime and algorithm that I'll describe on the next slide scales with the robustness squared. So I'm just going to give you a rough overview of what this algorithm is and it draws heavily on other algorithms that have been... that are quasi-probability algorithms. So we're going to take our circuit, we're going to break it down into Clifford operations and then magic states that we can inject in to give the non-Clifford component and then we're going to build a... renormalise probability distribution. So we just take the absolute value of each of the probabilities and divide it by something such that the sum of our probabilities actually goes to one now. And then we're going to loop through a procedure so in each iteration of the loop we choose one sample, sigma i, some stabiliser state with a probability p i and then what we do is we simulate our Clifford circuit acting on this state and at the end of the circuit we imagine that there's some measurement that we make. So we take the outcome of this measurement which will have value plus one or minus one and we output it but before we output it we multiply it by some extra pre-factor and the extra pre-factor is going to be the robustness of the state and also the sign, whether it was a positive or a negative contribution to the state. And if you do this then what you find is that the random variables that are coming out of this machine have a mean that is the same mean as you would get from actually running the quantum computation but the variance is much, much larger. So to combat this increase in variance you need more and more samples and using simple inequalities one can find that the number of samples has to square like the robustness squared. So that's essentially what's going on with the algorithm and then the interesting question becomes if we want to really minimise the robustness because that's what the simulation overhead is dictated by then how do we find good decompositions and we can do this just using simple convex and linear programming tools from MATLAB but the interesting thing is that there's something slightly counter-intuitive which is that if I wanted to find a decomposition of many copies of a T state I could work it out for one T state and then just copy it N times but what happens is that if I take a larger block say two or three of these T states and run this convex optimisation problem then what I find is that I get a robustness which is slightly smaller than you might have expected so instead of just taking your N states because it's well if you took all N states and tried to calculate the robustness it would be intractable so you can't do it so you instead block it up into smaller blocks and the largest block that we could manageably calculate the robustness so far is enough for five qubits so that gives us some scaling that tells us that if we want to simulate a circuit that has such of these T states it's going to have a run time that scales like this so I've mostly talked about T states but we're not limited to just T states we can also think about these other states these other Clifford magic states that we could deterministically inject and if we look at for example the CCZ state we can again run our convex optimisation so the interesting thing here is that CCZ or Toffley is something that we know we can synthesise using four T gates so then you can compare which one's the best should I build this circuit using a CCZ gadget or should I build this circuit using four T states and what you find is that the scaling is going to be much much higher if you actually broke it down into T gates so what this suggests is that we should be really taking our circuits and trying to find some way to break them down into sub circuits where the robustness or the difficulty of simulating them is as low as we could hope for so that's everything that I wanted to say today about the quasi probability simulators I'm going to move on to the first of the two parts of the stabiliser rank work so much of this work was first introduced in this IBM paper I'm going to review what's covered in this IBM paper and also then go on to introduce a new selection of new results from this paper that is hopefully soon to be written so let's start off with a definition so stabiliser rank the name kind of comes from Schmidt rank so if you know what Schmidt rank is it's basically if I have some entangled state and I try to write it as a superposition of different separable states how many terms do I need so here we're asking if I write my non-stabiliser state as a superposition of stabiliser states what's the minimum number of terms that I can possibly use to get this so if we define such a quantity we again find it's a well-behaved monotone and it satisfies all the actions that people normally look for in a good way of quantifying a resource and the important one for today really is that it's operationally meaningful so we have an algorithm that has a runtime well IBM proposed the group of IBM proposed an algorithm that has a runtime that scales with the stabiliser rank so we're interested in what the algorithm is and how to minimise the stabiliser rank of states so I'll give you an overview of the simulation method now so again we gadgetise our circuit we break it down into Clifford and Magic states and we try to find a low stabiliser rank decomposition of our state and then we take our circuit which we can think of as being a Clifford followed by a projector onto some outcome of some measurements that you've made and we look at how this affects each individual term in the decomposition now because each term is a stabiliser state and this is just a Clifford operation we can calculate the new state resulting after we've applied this process now this state might not be normalised anymore it might be subnormalised which is an important point so once we've done that we want to know the probability of getting that particular outcome so what we have to do is we have to work out what the norm of this subnormalised state is now so in order to do that there are two different ways that you can do it so the first one which is from the original Bravery Smith's Smolin paper is that you exactly calculate it by taking the bra and the ket and putting them together and summing over all the terms so each one of these both the bra and the ket have chi terms so you end up with chi-squared terms and each one of them is an overlap between two subnormalised stabiliser states and so these are things that we know how to calculate and so you can exactly calculate what the probability of this particular outcome of this run is with a complexity that scales with chi-squared but in a later paper by Bravery and Gossett they showed that you can actually get a better scaling which only runs with chi which is a very, very significant saving at some price there's a small multiplicative error by approximating this norm estimation procedure so there's a lot of technical details there today that I won't go into the important take home message is that if you find states with low chi you can find a run time that scales with chi-si rather than chi-squared chi so again we find that there's a situation where if we want to try and take lots and lots of states and find the exact stabiliser rank we can't do that so what we instead do is we break it up into smaller blocks and we find this counterintuitive thing happens again that if we take blocks of 2 or 5 then the stabiliser rank is smaller than what you might have expected based on what you knew for a single state so for a single state the stabiliser rank is 2 I always have to write it as 2 terms but for 2 states it's still 2 which is really surprising so there are 2 entangled states that you can use to write the state in terms of and this gives you a lower overhead but you can go even further and find a decomposition in terms of 6 qubits and that gives you an even lower overhead yet so you might have noticed that maybe these numbers aren't so large what the stabiliser ranks are and it didn't actually change going from 1 to 2 this quantity chi is telling us what the complexity is of simulating how a circuit and it's constant and that it looks like it's going up linearly so I would call this the world's least convincing exponential we can go a little bit further in numeric so we can actually get up to 10 copies of a state and we find that it's jumped up quite a significant amount so it looks like it behaves linearly and then starts acting exponentially which is I guess what you would expect but I think it's really important to comment at this point that the rigorous lower bounds that we have on this are very very trivial there are no exponential lower bounds on this so if you actually believe that maybe we should be able to classically simulate a quantum computer then this is a potential place that you could do it if you could actually find a polynomial a polynomial scaling stabiliser rank decomposition of a state then you would have an algorithm that's actually simulating a quantum computer and there's no known result for why that wouldn't be true despite numerous attempts now personally I don't think that's the case and I spend more of my energy trying to find a lower bound than trying to find something which has polynomial scaling so that's the T state and much in the spirit of the earlier part of the talk as well some of the new ideas that we're cooking up revolve around saying well maybe I can get a much better simulation method if instead of breaking up my circuit that's T I break it up into some other gate set so if I want to do some rotation about the Z axis and I use optimal synthesis methods then what you find is that to get a good rotation say to a high level of accuracy you need hundreds of T gates so we're not going to be able to do a single rotation using this method so instead what you can do is you can take some other magic state which basically corresponds to some other rotation say u theta and use this fact here that when you try to inject it if you get the zero outcome you get the unitary that you wanted if you get the one outcome then actually there's some correction that should be applied which is again a non-clifford correction so you could go through the process of trying to put a gadget and then another gadget that you might use if you need to do the correction but actually none of that is necessary because when you do one of these simulations what you can do is you can just post select in your classical device on a particular outcome and that means that the simulation complexity of implementing the unitary u theta is going to depend on what the stabiliser rank of this other resource is so here's some numerics that we did that look at what the stabiliser rank is of a generic state theta so what do I mean by generic I mean pick a random state and take sure that it's not say the T state which will never happen by chance or a stabiliser state and then calculate what the stabiliser rank is so here actually every single state that we check has exactly the same stabiliser rank and it's almost the same as what you have for a T state so again it has this flat behaviour and then looks linear and then looks like it's starting to behave exponentially again sometimes the T state can have a slightly slower rank and the reason for this is probably because it has some extra symmetry that the other states don't have but almost every other state behaves the same way so that was the kind of numerical result and it kind of got us thinking of scratching our heads what can we say in terms of pen and paper proofs so here's one theorem if you give me a single qubit state and then T copies of it where T is some number that's less than or equal to 5 then the stabiliser rank will obey this linear scaling furthermore actually it's kind of surprising that there's always one set of stabiliser states that you can use and that set of stabilisers you don't have to search for them there's a fixed set of stabiliser states that you know you'll be able to use to write this state down so for example if I have T equals 5 then I can take 5 copies of any of the single qubit product states and this will allow me to form a stabiliser decomposition with less than six terms and the reason for this is that one can show that this set of six stabiliser states will span the symmetric subspace and since all of the product states sit in the symmetric subspace you must be able to decompose them in terms of these product states so there's also a slightly more sophisticated version of this result which looks at multi qubit states so if you give me an n qubit state then it follows that if I take T copies of it where T is less than or equal to 3 then the stabiliser rank is going to be less than or equal to the dimension of the symmetric subspace and in some sense that's the best that we could ever hope for using this proof technique because what we're doing is we're looking for a set of stabiliser states that spans the symmetric subspace and then concluding that therefore any states that's a product state with respect to that symmetric subspace must be able to be decomposed in terms of those states and why does the number three crop up is probably an artifact of the proof technique so the proof technique at the moment uses this result that the set of stabiliser states forms a projective three design so in fact I think this is probably five or bigger and may even scale with n but at the moment we just know that it's true for T equals three or less so that's everything that I wanted to say about exact stabiliser rank I'm going to tell you about approximate stabiliser rank now which again was introduced by an IBM team so Sergei Gravy and David Gossett and I'm going to review their work and then again mention some new ideas in this unpublished notes at the moment so approximate stabiliser rank is defined in terms of exact stabiliser rank you take your state and you take some error delta that you're happy with and you try to find another state that's delta close to your original one but has low stabiliser rank significantly lower than the original one that you had so you've got a state psi but instead you work with a state thigh that has much much lower rank and it's close infidelity or some other measure that you like and then what this means is that if you use this for the purpose of simulation then you're going to be able to simulate the circuit with some delta dependent error but with a much much lower overhead which I think is on the next slide okay so I'll quickly review the Bravy Gossett construction for how you get these low stabiliser rank approximations so one way to think about it is that you've got the T state and you can write it kind of in the composition it's not quite the composition basis what I've done here is I've written it as a superposition of two terms that I've labelled 0 and 1 but rotated by this operator here that isn't a unitary so in fact this will be something like the zero state and the plus state once you've applied this well sorry two different equatorial states and the reason why we write it that way is that it allows us to write many many copies of the T state as a superposition over all of these x goes over all possible binary strings of the right length but multiplied by tensor products of this operator r so once you've written it in that fashion the approximation is simply to take some subset actually what they do is they take a linear subspace of all possible binary strings and sum over just those terms so because you're summing over a small number of terms the rank is lower what they show is that if you choose this randomly with high probability given some delta you can always find some approximation state that has a rank that scales like this so there's some delta dependence but this exponent is less than we had earlier so I'm aware that every few slides I bring up a number and you probably can't remember what the numbers are so just in case you're wondering about that at the end of the talk I'll come back to all of the numbers and give you a comparison so that's the kind of previous work I should probably say that Browning gossip paper deserves much more than a single slide it has much more technical content including this proof that you can simulate systems with overhead that scales like chi instead of chi-squared and lots of other techniques but for the purpose of this talk we're just interested in knowing that it's told us that if we could find a low a low rank state that approximates our original state then we have an advantage so what we have now is a more generalized construction the previous one just worked for t-states so this construction will work for all states so what we do instead is we write our state down again and we again find some probability distribution which we make by taking the absolute value of these coefficients and renormalising and what we do is we just randomly take states with this probability pj and form an approximate state and if we choose enough the more terms we take the larger the rank will be but also the better the approximation so what we're able to show is that the stabiliser rank if you want to aim for some precision delta we'll have to scale in a way that depends on the absolute values of this C here so the simulation overhead is now captured by the some of the absolute values of the coefficients whereas before it was captured by the number of coefficients so there's kind of a zero norm optimisation and a one norm optimisation at work here and there may even be other kinds of optimisation that are important so there's a because you want this quantity to be small to get a better simulation you want to find the lowest possible value of this one norm of the C vector so this actually turns out to be again a convex or linear program and we'll denote the solution of such a problem by C star so that says the smallest possible value that you could hope for for a given state and we actually know unlike with stabiliser rank we knew very little we actually know quite a lot about this quantity C star so we can give a lot of lower bounds on what it has to be so as is often the case with a convex optimisation problem you have what's called the primal problem the original problem that you had but then you also have a dual problem and the dual problem is also something that you can fairly easily solve and it gives you a lower bound than the original problem and every possible choice of variables the dual problem gives you something that I would call a witness this is similar to the idea of entanglement witness in physics and here what we find is that the witness is actually a quantum state so for every quantum state that you give me I can find a lower bound on this C star which is this number that quantifies how hard it is to simulate a circuit using psi as a magic state so how does the low bound work well you have the overlap between the original state and the one you're using as a witness divided by the fidelity of the witness so what do I mean by fidelity I mean the maximum overlap between the witness and the set of all possible stabiliser states so that's the kind of algebraic definition but you can also think about these things geometrically as well so the set of stabiliser states forms some convex shape and it's divided from all other states by a series of hyperplanes or at least the good witnesses tend to correspond to facets of these polytopes and they're trying to look is the state somewhere out here so you get a lower bound for every kind of witness but some witnesses are better than others so you can have a trivial witness so a trivial witness is one where the witness is trying to find out if the state is over here somewhere but the state isn't even on that side of the hyperplane so it doesn't tell you anything non-trivial so the witness might tell you that C star is bigger than some number but that number is less than one and you knew that it had to be bigger than one anyway you can have a bound which is non-trivial because it gives you some actual constant that's bigger than one and it tells you that C star has to be bigger than that but it might not be an optimal witness it turns out that for these kinds of convex optimization problems so-called strongly dual convex optimization problems there will always be an optimal witness an optimal witness that actually tells you what the value of C star is and this fact allows you to prove quite a few things so one particular interesting witness that we can take is we can actually just take the witness to be the state itself and if you do that you get a simpler lower bound which is just that this thing C star which quantifies the difficulty of simulating the circuit is lower bounded by the fidelity so the fidelity with respect to the set of stabilizer states and furthermore if this magic state that you're interested in is one of these Clifford magic states the ones that we know we can use to deterministically inject to the unitary then the mirror witness I've only just started calling it that so let's see if it survives into the paper the mirror witness is going to be optimal so it actually tells you what C star is and it also tells you what the decomposition is so the decomposition then becomes so I've got a summation here so the summation is over a whole bunch of different C's which are Cliffords and the C's are the Cliffords and we've stabilized the magic state remember that's how we define the Clifford magic state it's a state that's an eigen state of some group of Clifford unitary so we sum over all the symmetries of the state apply to some stabilizer state and the stabilizer state that we use is just the one that maximized the fidelity up here that's a nice clean story there's only one part of it which isn't clean which is that actually solving this optimization problem is hard so finding the closest stabilizer state is a hard problem but the more effort you put in to finding the closest stabilizer state the more efficient your simulation algorithm is going to be so for example we can look for the closest stabilizer state to the CCZ state and we find that one of the closest states is just plus plus plus and this gives us some value for the fidelity and then we can just invert this and we get some scaling and this gives us what the overhead is of simulating a circuit with NCCZ and again as before if we instead of using the CCZ gadget broke it down into four T gates and some Cliffords then what we would find is we would get a much worse scaling so it's better to work with larger segments of circuit so on the right hand side here I have some preliminary data from MATLAB so we believe that we could probably get to larger scales than what we've done so far and it's a runtime against runtime plot but what we're trying to illustrate here is the advantage of using the CCZ gadgets vs the T gadgets with all other features of the code held constant and what we see is that we've got these scatter of points going up here and if the two simulation methods compared the same we'd have equal runtime but actually we have another line here which would be exactly a factor 10 speed up and we're kind of at the moment our numerics are around the regime where we're seeing a factor 10 speed up but we think we can probably get to well we think we can probably look at larger systems and then the speed up would be greater because the difference between using T gadgets and CCZ gadgets is exponentially separated okay so I've given you the background material and also the three different types of simulators the three different types of stabilizer simulators that I wanted to talk about so I'm just going to give you a couple of slides now that overview, put all of the numbers together because the numbers were all on completely different slides and also to take clear, take sure it's clear who did what bits of work because I realise I've just clumped lots of things together into one talk so in some sense here this is maybe I might think of it as the prior art slide so we have this result from a long time ago by Aaron Stingotsman which just shows this proof of principle thing that it's possible to do these kinds of stabilizer simulators with some exponential scaling but they have a very large 4 to the n or m so here I've got the T gate the CCZ gate and I wanted to have something else that wasn't a Clifford magic state it was just some other gate that was a small rotation and so I picked the square root of T so that's a good proof of principle then we have these exact and approximate stabilizer rank results that both came from IBM papers and these gave much improved scalings on the exponential overhead of T gates so there's although the scaling here is better for the approximate case than the exact there is some price you pay in terms of some constant factor slowdowns so in work with Mark Howard we kind of conjured up this idea of quasi probability simulators and we looked at how they performed for T gadgets and CCZ gadgets and so we get some exponential scaling which is so this was kind of the first paper where people started quoting things for CCZ we have some scaling here which is not as good as these ones which is a little bit disappointing but I think there are maybe some advantages to the quasi probability approach and set regardless so the two main ones are that it works for noisy circuits it's inherently defined for things that are written as density matrices rather than pure states so it works for noisy circuits not only does it work for noisy circuits but as you add more noise the simulation gets easier and easier easier and easier and easier so if you have a very noisy circuit then the simulation is probably not even going to be that difficult using a quasi probability method that is one of these kind of like embarrassingly parallelisable problems so the overhead is the number of samples that we have to choose and then run the simulation so we could just have many many different computers just choosing samples from different and then adding up all the results at the end so we could massively parallelise it it's less clear to me the extent to which the stabiliser rank methods can be parallelised although it's definitely something that's worth looking at having said that the fact and approximate have much better exponential scaling so in this much alluded to in progress in preparation work we kind of fill out some gaps here so we look at ideas for doing CCZ gadgets to these Clifford magic states but also some other things that don't sit neatly into this Clifford magic formalism and we find that so maybe some notable things are that for the exact stabiliser rank because there wasn't really much difference in what the rank was whether I took a t-state or some other angle what you find is the scaling is roughly the same in these two boxes here whereas when you go to the approximate stabiliser rank case you see a definite separation between doing small angle rotations and bigger angle rotations so as you make the angle rotation smaller and smaller and smaller in the approximate case the simulation becomes easier and easier and easier there is some price some constant factor price that depends on the level of the approximation so that's everything that I wanted to tell you about today oh no one more slide what are the future plans was one obvious future plan which is that I keep alluding to a paper that's in progress in preparation so we're going to write that but we have plans beyond that which is that we would like to make a implementation of all of these ideas I mean I kind of think of this work as being preparatory work to actually writing some software that could go into something for example we're thinking about writing these quasi-probability simulators no sorry the stabiliser rack simulators in a way that they can be used easily within quiz kit and trying to optimise them as much as possible so this is kind of like preparatory work to try and settle in our minds what is definitely the best thing to write in quiz kit so thank you for listening so very nice thank you one thing that I got lost on so you talked about general theta and at least one of your slides there was this slightly surprising result that all theta rotations required the same cost for that method so overall the methods if I want to simulate very small rotations is there a method that allows me to do that relatively cheaply well I think if there are I think if there are small rotations and there's no noise then I suggest using this new approximate stabiliser rank method that's being written up at the moment maybe this let's go ahead hi thanks a lot for that great talk I just wanted to find out if there's I think this stabiliser basis that you use in this quasi-probabilistic framework is really nice but there's a lot of freedom in that framework to potentially choose other bases I just wonder if you guys have looked at potentially choosing other high-dimensional bases that will really reduce the negativity other frames you mean yeah other kinds of frames no this is something that Mark Howard suggested a couple of times but we haven't got no we haven't looked at any of the thank you I assume in this chart everything has a big O yeah and a lot of these numbers like 1.17 and 1.12 I think quite the same and because we're using a classical computer we're not going to be going to T equals billion for example right so I'm assuming maybe the constants might make a difference but you have 1.17 you have some very close numbers there well for all of these the constants would be roughly the same but if you go between the exact stabiliser rank and the approximate stabiliser rank there will be a constant that matters at the beginning you said that matrix product states can do 50 qubits or something like that and I wonder how far you've tried to push this yeah I also wonder you're also wondering it'll be very exciting to see so maybe comment on some differences so in the matrix product state approach even control not gates and clifid gates have some costs associated with them whereas here they have zero cost you're also limited in terms of topology of the device in the case that if you're looking at one of these random nearest neighbour simulations of a superconducting circuit which is locked into 2D topology there maybe is the case that doing matrix product state is going to be the best simulator you can hope for but if you have some other topology or you do find that for whatever reason performing clifid gates is easier than other gates so for instance if you're benchmarking some fault tolerance protocol and there's loads of clifid gates throughout the fault tolerance protocol then it's going to be much much easier to use to make whether the clifid gates are free in terms of how far we think we can get well there are already simulations that have done circuits that have say 45 or 50 qubits in them but honestly I don't think that you should think 45 or 50 45T gates I think but you shouldn't maybe think of that too seriously as a limitation of the methodology because if you look at this recent IBM paper where they do 66 qubits using matrix product states they're using states of the art supercomputing hardware and very very optimized code this is code that's put together in Matlab run on a laptop so in terms of future work one of the questions definitely is what's the most you can possibly hope for on highly highly optimized code run on a supercomputer and we just don't know there is a poster yesterday that did 60 qubits with matrix product states for sure is all good okay that sounds good I should have put 60 at the beginning of my talk instead of 56 then I would sometimes just be careful with these things because especially for matrix product states it depends on the depth so I just put 56 I think the depth then was something like 23 somewhere in the 20s so it also depends on the depth no other questions maybe one quick last one alright if there are no other questions let's think about Erlegan