 everyone. We will have Frank Vilem's talk now but afterwards there is a poster session and I just wanted to announce that the poster session will be just outside the room on the wall basically facing the back of this corridor and it's just in here you may have noticed it but in case you haven't I wanted to announce it. Thank you. Okay thanks a lot. Do people hear me? Yes. Okay very good I'm actually probably the only wrong thing is that I could be too loud. So this is a picture of Jülich which has a water feature which is kind of exponentially less beautiful than here so I would like to thank the organizers for inviting me to talk about controlling microwave qubits and maybe in the end a bit of optomechanics even though I probably don't have time let's just see and the clicker used to work it doesn't work anymore but that's fine I don't have animations. Well the presentation is completely frozen. That is not so good. Actually the computer seems to be completely frozen no I'm just stopping to screen share and now I am screen sharing again making a food out of yourself by applying technology is still a very popular discipline. Okay very good so I would also like to thank the organizers for tolerating an off topic talk so I got this wonderful invitation and when Joachim asked me will you come or not I told him thanks for inviting me about the topic of the conference I have nothing to say but he felt that this would be still interesting however if the talk is badly given he's not to be blamed for it that is completely on me so I will talk a little bit kind of give a ramp I think half of the room probably has hurt my intro to optimal control for superconducting qubits a few times for the others there will be a refresher and then I will talk about a key ingredient for this which is system identification by Bayesian learning also known as spectroscopy on steroids and then how with both things together you can actually make calibrating your qubits a lot faster and more efficient and more precise which specifically is important when you scale up so what's the goal of gate designs well here is a chip somehow this is still one of the sexiest pictures of a transmon chip that you can use we have some controlled electronics and then we really want to make the gates in a quantum circuit with high with very high accuracy and we want to make them fast because at least for the completely Markovian the completely white part of the noise the best strategy for avoiding decoherence is basically to run away from it secondly you want to minimize gate error from all kinds of other sources so for example from being not at the optimal working point from unitary errors from spurious degrees of freedom that are itself very coherent but still that they perturb your system and you want to make this work under realistic conditions and ideally you know we know about a lot of intuitive recipes for this like Rabi oscillations and various refinements but ideally the the timely pattern Schrödinger equation gives you so many degrees of freedom that you would like to use some systematic mathematics to this and specifically as already mentioned when you scale up the time it takes you to calibrate a quantum processor with more than 10 qubits is significant and if you went to John Martinis's talks when he was still at Google he showed you these giant pictures which were never published we have it as a photograph from a conference with a directed acyclic graph with about 40 nodes and an average labor input of 0.5 PhD thesis per node to make this happen and this is very impressive but I think this complexity also shows you that maybe the method should be rethought because what he did is he took all the experimental knowledge which of course is immense in this area and programmed it better but maybe we should do it with less intuition if you want now as kindly you know I talked about something oh to just set the stage this is for situations when we have a lot of spectral crowding and it's interesting because it works and it helps us to understand why generalizations usually don't work this is the drag method which probably half of the room has a by now a subroutine in their control electronics to do it and it's basically based on the evolution of qubits say the first roughly 10 years when by becoming more and more modern from Cooper box flux qubit quadronium to transmon we had very good reasons to become less and less nonlinear so to make systems that are basically weakly unharmonic oscillators and the problem you have is that if you have a weak nonlinearity you get spectral crosstalk when you drive 0 to 1 yours 1 to 2 transition is not so far detuned you start driving this and you can prove that a driven harmonic oscillator has a semi classical propagator so then essentially all quantum dynamics breaks down you can only do semi classics and this essentially limits your bandwidth this should be tg to the minus 1 your gates need to be a spectrally narrower than this crosstalk unless you do something clever and it was shown in a long time ago that this is a typical hockey stick plot here that if you have a long gate time decoherence is eating you up so you want to go shorter and shorter to beat white noise but at short gate times you have a unitary problem because you're now you're probing your qubit at a bandwidth that encompasses the leakage transition and that is limiting you and of course drag which many of you know is the solution to this we discovered this numerically and then we came up with various retroactive analytical explanations you still drive your system resonantly with cosine omega t times a for example Gaussian envelope but then you drive a face shifted tone a face shifted signal as well and I think yesterday it was mentioned about IQ mixers this actually happened when my graduate student when my postdoc J gum better told my graduate student phoenix modso experimentalists have this thing called IQ mixers we can shape you too for free and that was a long time ago but in any case when you when this satisfies this differential equations was proportional to the time derivative you're actually suppressing leakage and here you see plots from the same two groups that here for example show that without drag and with drag you get a lot less error this is the fidelity of randomized benchmarking which with drag goes on pretty nice and if you have a second qubit that's nearby you can make a slightly more complicated strategy where you hear you have drag what it's called you here is called omega there and you need a Gaussian pulse shape with an additional modulation to make this work but watch and this also works pretty well here you see that this strategy which we call wow reduces the error but what these strategies have in common is the workflow for experimentalists they are very simple here is the non-linearity and here are things that essentially you can take this as an inspiration take this on the control qubit of your experiment fiddle around until it works and then it works because these are pulses that have a very low complexity now here is a result from numerical optimal control that looks kind of at wow on steroids so what are we doing if we have two qubits that are very weakly coupled and that happened to have a frequency collision so two frequencies are the same for example the frequency of qubit zero one of qubit a and qubit b have the same transition frequency and then you're basically kind of stuck but you can come up with optimal control pulse shapes which have a monochromatic carrier and then a rather complex envelope which still allow you to do something this is the pulse shape for a C naught and you know it's complicated that's the only message and even idling is complicated and think about this IBM is has fixed frequency qubits a fixed coupling and because these frequency collisions can happen because fabrication never completely precise they have this heavy hexagon architecture with relatively low connectivity to make sure the risk is relatively low now wouldn't it be nice if we could use these pulse shapes to do this now this is an extreme example but you see from comparing this slide in the slide before what the problem is this is very hard to calibrate on the experimental toolkit we don't understand why it looks like this it is based on having a good model so this is why we need to think a little bit about models now here's a bit of a technical slide but let's make it kind of not too brutal what we would like to do is I forgot the integral and every time I show this slide I need to remember I need to put an integral there we want to have the so there's an integral that goes here we want to have the time evolution operator of the time-dependent Schrödinger equation to do what it's supposed to do we have a Hamiltonian which has a drift the untunable terms which has external fields and we can essentially use gradient base gradient on the pulse shape which we can compute in closed form to minimize the gate error and that tells us how we have to update these external control fields to make it to do what we want and if we do this we get something complicated and the first problem is that we assume that this is actually known and of course historically this was introduced for NMR and then used in atomic physics where we do know the Hamiltonian pretty well because we have precision spectroscopy but in our case every qubit is different so we have to do something else so a these parameters are all unknown and b there is degrees of freedom h junk that basically are also sitting there and you don't quite know and those can include fabrication uncertainty that for example you know you have a slightly different qubit to bus coupling that can include the transfer function from your signal generator to your qubit which is also tricky to measure because if you you would like to measure it with the qubit itself because keep in mind in quantum computing we want four minds precision in the end of the day and one way how you can solve this in superconducting qubits is has been done by the Yale group and it's my computer now stuck again it is that is embarrassing and now it seems to be totally no I can't do something no I can't even move this okay it's working again I shouldn't press the button so often it's maybe commenting on my nervousness so the Yale group has shown that you can actually do this numerically for example in a 3d transform on which a is a very very clean system and be most of their work they are looking at still relatively long pulses like 500 nanoseconds which means they need a good model and a relatively limited bandwidth so for really going fast this is a good starting point but it's not the end of the story so what you can do and I will soon come to new stuff is basically to take what I've just explained you you design your system you characterize it you do this open loop gradient search and then do a closed loop calibration where you measure fidelity by randomized benchmarking and then you're readjusting the pulses a bit like a person on a mixing desk the ICTP this is a public domain picture we can show it and fine-tune the pulse and this works that can be done and we published this a couple of years ago on some labs use it but the problem is it's cumbersome it takes a long time so you want to avoid it as much as possible you want to have a good model for this so now we essentially need to think how can we do extremely good spectroscopy how can we do a model that is good enough to get us to four nines and here is a starting point this is also a bit of old work I just moved so you know my papers are all a bit old and this is called system identification and it's based on base theorem who of you knows what Bayesian probability is that is more than half of the room brilliant so this is a repeat basically Bayesian probability is an interpretation of the concept of probability that is complementary to Laplace's interpretation and there is a lot of kind of sectarian discussions among the probability theorists which you can safely ignore but basically it interprets probability as your degree of knowledge and the weather report of Trieste is a good example for this because a frequentist where you need to repeat everything a lot of time for this the precipitate probability of precipitation of 15% makes no sense it basically distels you we know enough about the weather of today that in kind of 15% of the days where we had the same data it rained but as the day progresses this gets updated because we learn more about what's going on so this probability becomes adjusted and the confidence for this a probability also becomes bigger and this is all based on base theorem which is written here it's elementary to prove by joint probabilities and it's made out of weird symbols so what this means is X is a parameter that you would like to measure D is data that you have taken for example the observation of the weather up to now and access does it rain today and the probability of X given D is proportional to the probability of D given X times the prior knowledge the probability of X so for us in quantum mechanics D given X basically goes as follows we can you know use the quantum state and to prop and postulates of quantum mechanics to predict the outcome of a measurement which in the non-degenerate case is the overlap with state n and if now our state psi parametrically depends on X X is for example something in your Schrödinger equation then the probability of a P n given that X is somehow contained in psi is P of n given X and n is an example of data so and this tells us and so by the rules of quantum mechanics we can compute this and then when we measure something we can up dot our knowledge if we measure something that is already likely it confirms our probability if we measure something that's unlikely maybe we have to change our probability a lot and this is called the likelihood this is called the prior of what we knew before this is the posterior what we knew afterwards and in some work that back then was mostly going for theoretical audience but I think it's time to you know we've now used it in more advanced tools and it's time to wake it up this we've probably realized that this goes into feedback algorithm you know you optimize some measurement settings you measure you update you get a new likelihood and after you've done one experiment you feed this in as a prior to your next experiment and you repeat and here is an example namely swap spectroscopy chevron patterns which in many cases is one step of tune up so we have a qubit here we assume it's frequency tunable we have a resonator of an unknown frequency that can be a resonator you made or that can be a two level system a two level resonance that you would like to understand better and if you hit the resonance then you get a coherent oscillation with a large amplitude and the coupling strength G is given by the frequency here and this has a few features for example when you realize you are here you should really do something you should stop taking data you rather want to measure close to there at the areas of high contrast and actually when you take this probability and you just simulate doing one shot of an experiment so you take a dither this is like old newspapers you see that even though this is very low resolution it has a lot of noise on top of it you can make out the figure you can see what's going on so what we came up with in a paper already a while ago was a method that has a measurement schedule like this all of these points are one attempt to a quantum measurement so click or no click yes I know that for current control electronics this is impractical this is a very theoretical paper and then you move on and let me explain this a bit more on the next slide so first of all how this goes is that you update your prior and your posterior and for example every now and then you're choosing a different setting of various parameters let's not go into this too much detail let's rather focus on the next slide so this is again swap spectroscopy and the idea here is this following initially we know that we have this resonator somewhere we have a vague idea of the frequency band we have a vague idea of the coupling strength that's our initial prior and then we take about 15 data points essentially randomly so we go on a random fishing expedition so our prior by base update gets a little bit narrower and then after this we do the following we look at what is currently at step and my guess for the resonance frequency so or my guess for the detuning and then we go to zero detuning so for what currently is my best guess for detuning we also use our guess for the coupling strength and then we do adaptive interferometry we also check what is the standard deviation of the coupling strength and then we want to go far enough out so the phase difference if we go out here which is essentially k times the frequency difference if we go by k oscillations it gets larger so if we want to resolve a tiny frequency difference we better have a large k but given that we know also the standard deviation of of sigma this should be a delta g actually that we also know the standard deviation so we should not go too far out otherwise we could actually miscount the number of oscillations and this allows us to basically step out here relatively quickly and you see that with no noise in about 150 measurement shots you would go to extremely low error now this is has a lot of assumptions you see these are all curves with noise and then it's not so good but still this shows us that in principle very data economically we can characterize our system and for us we can also do this for example for finding two level systems here is a situation with two two level systems where we can even test various hypotheses so we can also count them and figure out how many of those do we need that our data is satisfactorily described there's the statement by George box a statistician that all models are wrong but some are useful so here the idea is to count as many two level resonators as we have that we get high precision but not all of them that are on our chip with the extremely widely tuned and here you see there is scenarios like for example and D equals two means we have two of these resonators P1 P means we're checking that we have at least one and that converges quickly but if we would say we don't have more than one then the probability after a couple of measurement shots would go down so we can also you know count things so this is kind of fitting on steroids and we have taken this tool together of optimal control and kind of came up with the following circular process we start with an initial model something we have designed and we use this model to we do some initial characterization very simple to find a good set of pulses then we construct gate sequences and we evaluate those pulses with the optimal control methods from the first part of the talk first with models them without model and then model three and after this we check now that we have learned something from our calibration how do we have to update our model our calibration data telling us that we have some extra resonators that we have parameters that was slightly wrong and with this we're updating our database so when we calibrate next time we don't have to do it all over again we have taken this with a number of really smart software developers and tools like TensorFlow and programmed this in an AI framework of automatic differentiation and there's now two branches of this we test this with various of our friends but we've also started a company the name used to be there but the layout is not great it's called cruise like cruise control for quantum computing starts at the queue is cruise dot EU so how does that look like let's go back to an example from the beginning of the talk this was an experiment that's first one of the last things that's what chef unfilip did before he left IBM we went back to drag you know once you have a hammer everything looks like a nail of course so drag this rule of the derivative is the first order of a long perturbation series but the higher orders become more and more sensitive to details of your Hamiltonian because you can start asking you decoupled from the third level but what about the fourth and the fifth and these are all legitimate questions so you shouldn't push your precision of your calculation more than your model so we applied those tools and we were able to show the following we go from an initial drag pulse the blue line this is a Gaussian the time derivative discretized this is the derivative of a Gaussian and when we ran it and we got a kind of a decent error this is the population of the leakage state but at about a hundred gates the gate became bad gate became bad but then we ran this through this machinery of simultaneous numerical optimal control and system identification the pulses got better so this is a much bigger a much lower reduction of the quality the pulses became less intuitive now there's this extra bump in here and the derivative looks all over the place and it's not the derivative anymore precisely but they worked really well so these are drag results and you see that if you get shorter and shorter you kind of get a reasonably sharp curve because drag remember drag doesn't help you beat the Nyquist theorem drag doesn't help you to beat the bandwidth limit it just helps you to go right up to the bandwidth limit and rather than having to stay in order of magnitude above it and we saw that with our pulses these are piecewise constant that we could go all the way with not really a systematic trend you know this is a little bit worse but this is better again until a sharp time and we were kind of proud of this that we understood the quantum part so well that we identified the time at which the microwave signal amplitude was so high that the input cable was starting to make problems it became unlinear a nonlinear in an unpleasant way so we discovered kind of a unexpected limitation to this so this is a very simple setting but it encourages that this method may be useful here is a synthetic example where we essentially run a numerical simulation of a system and we do full model matching so we are using what is called a model matching score which is essentially a type of a standard deviation and we are we have a black box model that includes a relatively complete model of in this case two coupled transmons and we can model them as uncoupled duffing oscillators this must be clearly wrong we can add coupling but we can also take all the bells and whistles be below the nothing oscillators and we see that the standard deviation gets stuck for all models besides the full one so this validation on synthetic data tells us that we can actually find out that this coupled duffing oscillators model is incomplete and you can also do this this is more simulating a full tune up so here we have the resonance frequency of transform a transform B the non the nonlinearities of transform a transform B the coupling strength a parameter of the transfer function the fridge temperature and the exit the excitation and relaxation rates and basically you're not doing anything you couldn't do by hand but you're doing it all the same time at about 150 iterations so you realize that if you initialize by thermalization you know your your fridge temperature was slightly wrong you thought it was 55 but it was really 50 so it at least shows that it's data efficient and we cost now try to do this if a lot more systems and I will probably actually not talk about the wee bit of optomechanics because that is a big change my group has started to do optomechanics and we're happy to discuss about this so I hope I could convince you that the optimal control that we have been working on 15 years has been coming of age in the sense that after being mostly theoretically or quite a simple systems it can be applied and that the calibration problem of quantum computers even if you're not going to fancy pulses can be addressed if you combine this with the right AI tools with that thank you very much for your attention so thank you for willing and do you have a sense or results on how how does the computational complexity or the algorithmic complexity scale with a number of qubits if you're trying to do this Hamiltonian learning so it is all based on the assumption that if we have a large quantum processor we can at some point tile it into tiles that are overlapping but if they're not overlapping they're independent of each other if that doesn't hold quantum computing and superconducting cubits is probably a dead end and within this we have not done any systematic scaling worst case it is exponentially hard but limited by the size of those tiles however given that we don't want infinite precision arguments like this degree of freedom can be neglected because it's far detuned still hold the practical computational effort is that my startup package was medium generous and this is we can run a lot of these things on our cluster at the same time you know it's I mean ultimately it's exponentially hard but with those assumptions I think it's it's manageable so thank you for very nice overview and great talk so often in these systems there's assumption that there's some sort of static although noisy reality that you need to somehow learn and then you can react to but we see that some subtle aspects of our systems have one over F drift which is non-stationary on the time scales that you're trying to calibrate so what are your thoughts regarding this ultimately I think people who use cloud quantum computers actually know at which point of the day the thing is calibrated and try to send their jobs then so I practically you need to recalibrate during the day because things drift and between days and the goal here which we make progress towards but you know there's no last word is that once you have run this a couple of times your modeling and your database of calibration results is large enough so small adjustments can be made quickly you don't need to make a full bring up now of course with true one over F noise you have the occasional catastrophic jump which you're powerless against but if it's localized you can at least kind of reduce the pain but you cannot remove the problem I can say one thing about this strategy a young master student of mine explained me that he had this absolutely brilliant method with two pie over half piles are separated by a break to measure T2 which he had found with this method and I congratulated him that his was way too late to get his Nobel Prize Norman Ramsey already had it but this tells you you know this is also kind of fun that this actually works and if people just haven't you know read the relevant papers yeah good question on the slide 20 I think this is next slide you had the fidelity of three nights about it's a flat line with the beyond drag thing why is it flat wise why isn't it getting better when you make the pulse shorter because the T1 limit is is that the T1 like four nights four and a half yes so this is the T1 limit so this is already in the area where you're limited by adjacent levels and the sample came out like this you know it was not a unimon and we simply could you know this is a typical hockey stick that at some point when you're clearly running out of bandwidth you have and you do an optimal control you get the sharp wall rather than a gradual transition and we could simply push closer to the shop wall I was surprised the red dots are really flat I mean I would have assumed that they somehow there's something you know we didn't reach the quantum speed limit and T1 was so long because you see we are significantly worse than the T1 limit that you would have to go kind of you know to the sea to actually see T1 effects so it was a very clean qubit noise wise but not qubit with a very clean spectrum and I think as one comment that should have made it on the slides the idea that in an experiment that has no obvious flaw you could get an error budget we think it's very appealing you know once you have an experiment with no obvious flaw it makes errors you want to know where is my residual errors coming from and to my understanding is hardly published and to my understanding this is also extremely difficult and this is something we hope to achieve with this good so since there are no more questions then let's thank Frank Wilhelm again so there is a coffee break now if I understand correctly and