 Okay everyone, I'd like to introduce our next speaker, Ryan. Ryan is a PhD student at Michigan State University where he pursues a dual PhD in physics. Sounds pretty stressful. And he's interested in both the physics of computation and the computation of physics. So it basically means like what can we learn from quantum physics about computation? And also how can we use quantum computers to understand physics better? And he's the creator and maintainer of the Quantum Open Source Project, NISC AI. And today he's talking about quantum classifiers, robust data encodings, and software to implement them. So please welcome, Ryan. Can everyone hear me? I think this is on, okay. Yeah, thank you. Thanks, Mark, and thanks to the organizers for this great event. So today I will mainly be talking about three things all related. The first is our library, NISC AI. And the second is some research that sort of spawned off of this about data encodings and robustness, which I'll get into. And then the third thing is just how anyone can contribute to NISC AI if you're interested. So here is our GitHub. NISC AI, we say, is an open source framework for quantum machine learning in the near term. Sort of one of several now. We have a little bit different approach than others. Well, we're in Python, but we're targeted only for Rigetti's quantum cloud services. Instead of being sort of hardware and backend agnostic, we're only writing for this backend. In the near term, it has very promising capabilities for near term quantum computation, such as dedicated time, active resets, and all these nice things. So far in our library, we're still sort of in a alpha stage. We have several examples for quantum classifiers, one of which I'm going to go through here in a moment. And we're looking to add some kernel methods for quantum machine learning in the near future. And we hope to get ourselves on PyPy at some point this year. So a high level overview of our library. On the left, we have tools for representing data. So C data for classical data and Q data for quantum data. Classical data is what you normally have. And quantum data you can think of as something like a Hamiltonian describing maybe a molecule. And then we have tools to design and put together so-called quantum neural network. So we have an encoder, which is something I'll talk about a lot in part two. This is how you map your input data into the circuits on sources for evolving. And then predictions and measurements for extracting sort of information from that circuit. And then we have tools for defining loss and trainers for optimizing this, the sort of normal thing you need in machine learning libraries. And then things that help you run on the back end. So now I'd like to go through briefly one of the examples that we have. And the problem that we're going to be considering is a classification problem, but in the sort of quantum world. So this, if you're not familiar, we input a bunch of feature vectors and each one of them has a label. And then we want to outputs some sort of, so to speak, intelligent quantum machine, which can correctly classify all these features hopefully and make predictions on new data that it hasn't seen yet. So quantum is in blue here because it's the same problem in the classical world. We're just using quantum resources to do it. So this is one example that we have. Here's the link if you want to look at it on the GitHub and I'll just walk through the code briefly. So the first thing we do is import and then start what we call an engine. An engine is sort of a connection to Righetti's QVM and Quill compiler, which you need to run locally on your computer. And this image down here is a graphic that Righetti has made to show these things. You need to do this locally. And this is nice because you can do it within your script. You don't have to use starts, another terminal and so on. The next thing we do is we get some data. So here we're just going to use random data which can be separated by a horizontal line. So that's line five here. And then if you want, we have tools to visualize data. So if you run this, you'll see a plot that looks something like this. So the top class colored green and the bottom class colored blue. And this line is showing you the decision boundary between the two. So here's the point where we get to the encoder. The encoding that we use here is one example of several that we have in the library. This is what we call this dense angle encoding. So the dense angle encoding maps two features into a single qubit. So here the features are X1 and X2 and they get mapped into the qubit written in the lower left as shown. And then on the lower right is this graphical representation of what's happening. So you're taking data that lies in the unit square. This is classical data and then you're putting it on the surface of the block sphere on our qubit here. The decision boundary you start with is the one that you see here. And the goal of the classifier is to train this to correctly classify all the input data. So that's what we do in a few more steps. We first need to define what our onsets is. So here on a single qubit, all onsets are essentially the same. So we use a product onsets here, which is parameterized by two angles. And then we add a measurement to the circuits. And the way that we get our classification is by doing this measurement. Once we have these three things, the encoder, the onsets and the measurement, we can form them into a quantum neural network. And this is how you do that here. We also provided a computer, which here is one qubit's QBM by Raghetti. And something which we call a predictor, which is how you map from the raw outputs of measurement data to a prediction. And this is the quantum circuits of what this code is representing now. Once we have this, now we can train. I'll just go through this quickly. This is the standard thing in machine learning. And then once you train, you can get all the predictions. And here in this case, you are able to correctly classify all of this data. So that's one example that we have. There are a few others. There's just a few things that I want to note. So this is an implementation with fewer than 20 lines of code. And most of them were sort of comments, so even fewer. So that's nice. It's modular, so you can play around with this and easily extend it to other models. You can add more qubits. You can change what the encoding is. You can change what the onsets is. And you can do all these things and test the performance and see what sort of performance you're able to get. And if you notice, there are sort of very few references to how you're actually running the underlying circuits and programs. We try to take care of that mostly all under the hood. You need to tell, at least right now, what computer you want to run on, but maybe that could even change in the future. And so this is nice for people who may have more of a machine learning background than a quantum computing background. And this could allow them to sort of get into the space. Okay, so that's the first part. The second part is now some recent research that I want to talk about. This is currently unpublished. I hope it will be published soon on the archive. And this is work that I've been doing with Brian Coyle, who is a PhD student at the University of Unberg and someone who I actually met at FOSTA last year. So he made contributions to NISC AI, and then we were working on this research project and he joined and had a lot of great ideas. So what I want to do is sort of formally define at a mathematical level what the quantum classifier model we're considering is. So the first thing, like I said, is we start with some data points X, which is our feature vector, and we map that into a quantum state, rosa bucks. And this is the encoding. After this, we evolve with some unitary, and then we make a prediction. So here on the lower right, you see this example for one qubit, that's what we saw, and you can generalize this to more qubits at the top. And the way we get the prediction is by doing a measurement of a single qubit and we do the classification as follows. So if we measure the qubits, zero, more often than we measure at one, we assign it to the class zero. Could call it anything, but call it class zero. And we assign it to the other class, call it class one. If we measure one more often than zero. Okay, so this is a sort of standard model considered in a lot of recent literature. This is perhaps the first in this paper by Adfari and Hartman Nevin from, I forget the year, but it's fairly recent. And here, they're doing the sort of same structure where it's this unitary circuits. And then they measure a single qubit in order to get their prediction. So there's these sort of three points that they go through. Again, they encode the data, evolve with the onsots, and then measure a qubit. There's other work that considers very similar models. And this is just to sort of motivate a little bit that what we're doing is sensible. So here again, the quantum circuit is essentially the same. And here in the lower right, they write down their equation for getting the classification, which is very similar to ours on the previous slide, except they add a bias here as well. And that's just something that you can do. And again, it's these three points. This is the onsots that they consider in this paper. And in most papers, a lot of people are focusing on the onsots' and not so much the first points of the data encoding. And that's what we're going to look at here. So here's the three points again. And yep, this is what I just said. So this is the one that we're going to focus on. And the motivation of why this is important. This is not just in the quantum world, in the classical world, how you represent data is critical to the success of a machine learning model. So on the left here, you're seeing the geocentric model of the solar system. And the orbit of the planets in this case is very complicated. And you can imagine it would be harder to learn than the geocentric model on the right, which everything is just a nice ellipse, essentially a circular orbit. So in the quantum world, how we encode classical data is going to be very important, both for the learnability of the model, that is what sorts of decision boundaries you can learn, and also for robustness to noise. And there's an interesting sort of qualitative thing that we see is that there's a trade-off between these two. So this is a sort of cartoon picture of how I think about encoding classical data in the quantum states. So you have Hilbert space, which everyone likes to comment, that it's very large, this huge place where we can store all of our big data. But that's not really completely true. There's a sort of long, narrow pipe that we have to squeeze it through to get it there. So a lot of work in quantum machine learning assumes sort of a full wave function representation of an input feature vector. But this is known in general to take linear time in the number of features or exponential time in the number of qubits. So this can be a very limiting assumption, especially in near-term practical approaches. Another thing is QRAM, which is sort of a memory model for quantum computers. This is sort of interesting from an oracle and complexity standpoint. But again, in terms of practicality, this can't really be done right now, and maybe not ever. And the third thing, which I mentioned briefly before was quantum data, which doesn't have this encoding problem because it's sort of, the underlying data has nice physical assumptions, such as locality and other things. So you may be able to get away there, but if you want to do any quantum machine learning on classical data, you're always going to have this problem. And these encodings have been studied a little bit, but not too extensively. This is a table taken from the very nice book of Maria and Francesco. And one example that they have here again is the full wave function encoding, where you're encoding each feature into the amplitude. And you have this logarithmic number of qubits, but exponential depth. And in the table, there's other encodings that they consider. I won't go through them all, but just to show you that this has been thought about a little bit, but not too much. One nice recent paper defines this tensor product encoding, where they're taking cosine and sine of an input feature vector, and they're doing a tensor product encoding over all of the qubits. So here, if you had capital N features, you would encode them into capital N qubits. So it's one feature per qubits. And this is another type of encoding that can be used in a quantum machine learning model. And when you think about it, you can sort of get creative with how you do this representation. So this is the example of the dense angle encoding that we used in the example earlier. So here, we still have cosine and sine, but you could also add this exponential term, and you can pack sort of two features into one qubit. So that's why we call this the dense angle encoding. And this is the same image that we saw earlier. And if you really think about it, you can generalize this to any sort of L2 normalizable functions. So we don't have to use cosine or sine. There's nothing really special about that, other than it's the sort of one of the standard representations of the qubits. But we could use any functional form here as long as it defines a valid quantum state. And the important and interesting thing is what functions we use sort of directly determines what decision boundaries the model is capable of learning. And for the model we're considering, again, you can figure out what the decision boundary is, at least in principle, by solving this equation. And for the single qubit classifier, it's not bad to do. You can write it down in terms of the optimal matrix elements. The sort of variables in the equation are important, but what is important is that it depends on the underlying data encoding. So row zero zero depends on the function f here, and row zero one depends on f and g. And this is two examples of two different encodings and the type of decision boundaries that you can learn. So I'm not trying to hypnotize you here, but I'm just trying to emphasize this point that on the left, for the wave function encoding, you could see the decision boundaries you're capable of learning are distinctly different from the ones on the right, which is where you're using this density encoding. So that's one interesting thing about this type of encoding, how you represent your data. Let me skip this, this is just a side note. The other interesting points you can ask is for a given encoding, how robust is that to a noise, to a particular noise model? And this is the main question that we're focusing on in our work. So what do I mean by robustness? So let y hats of Rosa Bex denote the predicted label by the quantum classifier where there is no noise. And then for some noise channel, script E, we say that the classifier is robust to this noise channel if and only if the following holds. So what this equation is saying is on the left is the prediction in the noisy case, and we want that to match the prediction in the noiseless case, okay? So you don't need the measurement statistics or even the state to be the same. All you really want for the purposes of classification is for the predictions to be the same. So this is interesting for several reasons. It's related to the idea of fixed points, sorry I lost my mic, it's related to the idea of fixed points of quantum channels, but it's sort of a generalization of that. If you have a fixed point, it's definitely a robust point, but a robust point is definitely not a fixed point. So I thought I would give two or maybe three slides on a very brief background about noise in quantum systems. So I can go through this quickly, but just to give you an idea. So noise occurs because we can't completely isolate our quantum computers or our quantum systems. They're always interacting with the environment. So we have unitary evolution, not just acting on the principal system, that is our quantum computer say, but also on environmental variables as well. And the thing we measure is what you get after tracing out the environment. And we often use this equivalent, but maybe more convenient operator sum representation, which is where you define these cross operators, E sub k, which satisfy this equation at the bottom. And these define a particular quantum channel. And there's sort of quantum channels that are commonly studied. Just to give you an idea, these are some which have nice graphical representations to get an intuition. So on the left is the polarizing noise. And you can think about this for a single qubit as uniformly contracting the block sphere inward. So the radio points stay the same, but the radius itself is shrinking inwards towards the origin. You also have defacing noise. Defacing is contracting towards the z-axis, the computational basis. So this is sort of making a qubit a bit. There's a poly noise, which is a slight generalization of depolarizing noise, where you can have different deformations along different axes. So the x, y, and z axes of the block sphere. And the one I want to mention, because it's important for the result that we have later on, is the amplitude damping channel, which is sort of this flow towards the ground states of the qubit. And that's represented in this picture here. Okay, so that's a very quick crash course in noise channels. And you can consider two regimes, maybe in the interest of time I'll skip this slide as well. All I'm saying here is you can consider the regime in which noise only acts during the unitary evolution. And you can imagine this is slightly easier to prove things in. And that's what most of our results are in. And the more realistic case is where noise also happens during the encoding as well. So that's on the bottom of the slide. That's all I'm saying here. Okay, so what's interesting is we can show that this classifier that we defined, this quantum neural network classifier, is robust to poly noise. If this condition here is satisfied, the probability of a bit flip and the probability of a bit phase flip is less than or equal to a half. So I debated whether or not to go through the proof, but it's actually very simple. And just to give you an idea of how to show this, I thought I would. So again, we define robustness as the classification in the noisy case being the same as the classification in the noise bliss case. So on the left-hand side of this equation in the proof, we write down what the prediction is for the noisy case. So here you have the trace of pi zero and then the evolved states but with the noise channel acting on it. And you can use just simple properties of the trace, linearity here to write this in terms of expectation in the noiseless case. So here it also looks like something that's expectation in the noiseless case. And you can just do a substitution and massage things a little bit to have the noiseless classification appear. So that's the trace of pi zero, rho tilde x. And now you just need to argue as follows. So this is the guy that determines what our classification is in the noiseless case. And if that's greater than or equal to a half, then all you do is substitute that into this equation and you see that the noisy prediction is also greater than or equal to a half. So you have robustness here. That's the definition. That's how we define robustness. And in the other case, you can flip it around and if it's less than a half, you can show the same thing. So that's one result. And this is an example illustrating that. So on the left, we have a poly channel in which px plus py is less than a half. And here the points are colored green. If they are robust, that is the noisy classification and noiseless classifications agree. And all the points are robust here as we would expect from the last results. And on the right, you're seeing an image where that condition is not satisfied. So here px plus py is not less than or equal to a half. And we have points that are misclassified here. So those are the red x's. I hope that's visible. So it's interesting that not all points are misclassified, only some of them are, and I'll get to that in a few slides. I'll skip these corollaries. This is just saying, you know, you might ask what's special about px and py? That's sort of an asymmetry. The reason those are the ones in the condition is because we're measuring in the z basis. So you could flip this around and measure in a different basis and then change around what the condition is. And this might be useful if you have a noise model for a particular computer where one of px, py, or pz is high, then you can measure in a different basis and get better classification results. You can also show, we can show unconditional robustness due to phasing. I won't go through the proof of this, but just to mention it, you can also show unconditional robustness to depolarizing noise. And this is nice because you can have depolarizing noise acting at any point in the circuit. So as far as this quantum classifier is concerned, depolarizing noise is sort of a joke. It doesn't really matter. So that's nice. I'll skip the proof of this. The one that I want to highlight briefly in the remaining time is the case for the amplitude damping channel. So this is where the underlying data encoding really shows up and matters a lot. So the result that we're able to show here is robustness if and only if this equation in the middle of slide holds. So on the right-hand side, the lower case p is the strength of the noise channel. And on the left-hand side, the condition involves the functions f and g, which define the data encoding and the optimal unitary parameters or matrix elements, u10, u11, and u10. So that's the condition for robustness. And you might ask if it's possible to find any functions f and g, which satisfy this condition. And the answer is yes. And the reason is related to fixed points. And I'll explain that in a minute. But first, I just want to give some illustrations of this. And again, to highlight where the underlying encoding matters. So first, let's consider the wave function encoding. So this is the standard one that a lot of people consider. And here, again, I'm using the same notation. The points are colored green if they are robust, and they're red Xs if they are not robust. So here, the strength of the channel is zero. So this is the noise list case. So every point is robust. Now we start increasing the noise. So now the strength of the channel is 0.1. And we start getting points that are misclassified. So a certain set of points, which are along the diagonal here if you can't see it, are the ones being misclassified. And as we increase the strength of the channel, this set of not robust points grows and keeps growing and keeps growing until it saturates. So one thing to note here is you still have points that are robust, even though you have this noise channel acting. So that's important. Now I want to show the same example, but with a different data encoding. This is now the dense angle encoding, which we defined previously. And here as we increase the strength of the channel, you can see that you still get misclassified points, but it's a different set than what we saw with the previous encoding. So now you have points that are getting misclassified here, and as I keep increasing the strength of the channel until it saturates. So again, same behavior, but the interesting and crucial thing is that this is a different set of points. So the robustness of the model, the robustness of the classifier really depends on the underlying data encoding, as well as the learnability. So I'll mention this very briefly. You can always achieve a robust data encoding for any trace-preserving quantum channel, and this is because any trace-preserving quantum channel always has at least one fixed point. So if you're really set on robustness, you can encode everything into that fixed point. However, this is where that sort of qualitative trade-off comes in from a machine learning perspective if you're encoding everything into the same points and trying to learn something about that data, that's a very bad idea because you have no way to discern between the points. From a quantum information perspective, that's nice maybe because you know that they'll all at least be classified correctly. They'll all be robust. So that's an extreme case, but it also holds for less extreme examples. So there's this trade-off between sort of the more robustness you have and the less learnability you have, and maybe vice versa. This was a nice picture that's somewhat related to what I was just saying that I should have had up before I bored you with that explanation. But yeah, so this is a nice picture. This was made by Brian, so thanks, Brian. And this should have been skipped, as you can see. So just to conclude, the second part, and then conclude the whole talk in a few minutes. So encoding classical data is an important but understudied problem in the literature. Many speed-ups and QML algorithms rely on this, which may not be possible. For practical purposes, we saw that the data encoding directly determines the learnable decision boundaries and also determines the robustness. So the classifier model itself, independent of the encoding, has some nice robustness. And then for certain error models, you can encode your data in such a way that you can still ensure robustness. Well, maybe perhaps maintaining learnability as well. So that's the example we saw with the amplitude damping channel. And again, the last point, a robust data encoding always exists just by virtue of fixed points. But again, likely at the expense of the learnability of the model. Okay, so that's it. Now I just have one slide saying we welcome your contributions. So if anything I said sounded interesting at all, I'd love to get in touch with you. If you're more from a programming background, we have a lot of programming that needs to be done. If you're interested in research, we'd love to hear your ideas. We have ideas. So this is our GitHub. You can connect with myself as well. I'll be around for the rest of the day. And yeah, so I'd like to thank Will Zhang, the Unitary Fund, and all of its sponsors for giving us support for Niskayi. I really want to thank these guys, Nick Ezel, Yusuf Omula, Joe Yasue, and Arqintiku for contributing a lot to Niskayi up front and others' contributions that we got later down the line. First thanks to Brian Coyle, who started contributing to Niskayi, and then also worked with me on this research part. And again, I met Brian last year at Fostam, so that's nice. And I'd also like to thank both Nana Liu and the NASA Coyle team for useful comments on the robust data encodings work. And thanks to you all for listening, and I'd be happy to take any questions.