 And he's going to speak about an analog quantum variational embedding classifier. Okay, my talk is our analog quantum variational embedding classifier. And it's a collaboration between our, between me and these courses. So, in our 2019 paper, Havlik proposed and demonstrated a variational embedding classifier based on a gate-based quantum computer. And in this IBM work, the data X is embedded as quantum state in a helper space by this circuit, where u5x is a data-dependent entangling gate, and each are the hard marks. So the circuit involves some trivial initial state to some embedded state. And our key concept here is the decision operator W, whose expectation value gives basically a binary classification. So it defines a hyperplane separating two classes in a helper space. This is an illustration of one qubit helper space, Bloch sphere. You're embedding some data into some points on the Bloch sphere. And this data is labeled. And you can find some operator to basically separate these two labeled data sets. And we can determine the decision operator in a variational way to best separate the two classes. Equivalently, we can fix in W and apply a variant circuit on the embedded quantum state. But the decision operator in the IBM paper is not a general solution. More general operators can be found in other literatures such as CIS Lloyd's 2020 paper. So in this 2020 paper, CIS Lloyd's proposed quantum metric learning where the user general decision operator, which works for a general multi qubit case. And this algorithm does classification based on the distance to some average density matrices, which are obtained by averaging embedded quantum states for each of the labels. And it's based on a parameterized QAOA circuits where the rotation angles of the gates are the circuit parameters. And most importantly, the non-linearity needed for the classification in this case is not purely from the quantum part. So following this thread, here are some details of the CIS Lloyd's algorithm. The data is transformed by some neural network before being fed into some QAOA circuit, which is basically some alternating single qubit and two qubit entangling gates. And some of the parameters are filled with the transformed data. And the rest parameters are taken as variational parameters to optimize or train the system. And the density matrix, the rho and sigma, they are the ensemble density matrices corresponding to the two classes. And the decision operator for binary classification is either rho minus sigma or pi plus minus pi sigma. And pi plus and pi sigma are the projection operators of rho minus sigma. Each decision operator will work, either one will work. And the expectation values of this decision operator gives the classification. And you can also visualize the clustering of the embedded quantum state by the overlap matrix, which is basically squared in the product of embedded states. Now let me talk about our classifier. Our work and the previous classifiers have some similarities. But comparing with CIS Lloyd's classifier, the defining features of our classifiers are, first, we removed the neural network. So the nonlinearity in our classifier is arising completely from the quantum part, from the nonlinear dependence of the final state on the schedule parameters. And we also replaced the gate circuit with an analog quantum computer, such as a annealer. And we also extended the case to multi-cubed case using a simple distance-based classifier. We use AO2 norm, basically. So the motivation is trying to answer two questions. One is could the quantum part alone do the heavy lifting that is providing the nonlinearity needed for classification? And another one is will and analog quantum evolution work. So here is an illustration of our algorithm. So the data, reshaped into a d-dimensional vector, will be transformed into some schedule coefficients by a linear transformation matrix. Then the schedule parameters define some schedule function, which are just some driving on the qubits, and they are coupling. And in here, we consider an annealing setting with independent control on the poly terms. So the system starts with a transversal initial Hamiltonian, and ends at a final longitudinal Hamiltonian controlling from the schedule function. And the system's quantum state will evolve from some trivial initial state that's ground state of the transversal initial Hamiltonian to some embedded final state. And the nonlinearity in our classifier is different from that in the neural network. So in a neural network, the nonlinearity is actually from the activation function, such as sigmoid function, used in the network. But in our case, the nonlinearity coming from the nonlinear dependence between the embedded quantum state and the Hamiltonian parameters. So if we just consider some data in 2D plane in some real space, that's points in some real space, then a neural network basically transforms the points from some real space to another real space. But in our case, our classifier transforms the data from some real space into some hearboard space. So it's kind of a new type of neural network. And we characterize our algorithm by numerical simulation. And the actual schedule in our simulation is actually a digitized version. And you increase the number of points, you can convert to the continuous case. And we tested our classifier on some common linearly inseparable data sets. That's basically the industrial standards for the machine learning, such as concentric circles, spirals, and missed digits. And these are basically not linearly separable. We need some nonlinearity to classify them correctly. And our classifier works for all the above cases. And I will take the concentric circles and missed cases as examples. So for the concentric circles, it's basically some points distributed around some concentric circles with some nonlinear gap separating these different classes. And we can basically visualize the clustering of embedded data of different labels with overlap matrix. For overlap matrix, whose matrix elements are basically squared in the product between embedded quantum states. And for these illustrations, these are the labels, circle one, circle two. That's the label for the concentric circles. And each pixel represents the overlap between the embedded states. We can see that before the training, the overlap within the same class is basically the same magnitude as overlap between embedded states from different classes. So the diagonal, block diagonal parts represents the overlap within the same class. And the off diagonal squares here represents the overlap for embedded states. It belongs to different labels. And we can see that after training the overlap within the overlap for the embedded states within the same class is not much larger than the overlap between different classes. So there is basically some clustering behavior involved here. And we can also basically visualize the separation of different classes by a trained classifier using some time evolution of the overlap matrix. And we can see that the aggregation or clustering behavior indeed shows up during the time evolution caused by the annealing Hamiltonian. And in this sequence, the s equals zero case, this one, is the starting of the annealing. So basically, in this case, the whole system is just at the initial state, at the trivial state. So the overlap between any embedded states are equal. And with the trained Hamiltonian, with the trained classifier, the annealing Hamiltonian will lead to some deep data dependent evolution. So the initial state will be evolved and finally settling down at some final embedded states. And we can see after the evolution or the embedding process is finished, for the embedded states, it shows up this clustering behavior. All the embedded states within the same class have a strong overlap, but overlap between different classes are very weak. And we also checked the influence of the number of qubits and the number of labels on the performance of our classifier. Taking the concentric circles as an example. So in these two figures, it shows the train or test accuracy as a function of number of qubits for two, three, four labels or circles situations. We can see that given the same number of qubits, when you're increasing the number of labels or circles, like two circles, three circles, four circles, the accuracy of the performance decreases. And when you're increasing the number of qubits but fixing the number of labels, the accuracy can increase when you increase the number of qubits. And we can see the scaling on the test data and the training data is similar, suggesting a good generality on the unseen test dataset. And these are some results on the concentric circles dataset. And we also tested our classifier on NIST digits on both binary classification on digits three and five. I kind of distinguish this by my eye, but this classifier can't hear. Or digits one, three, five. Relabel classification case. And just as usual, we also visualized the clustering of the embedded data with overlap metrics. We can see before training the overlap within the same class is basically as strong as overlap between different classes. But after training the overlap within the same class is very strong, but the overlap and the overlap between different classes are very weak. So there's still the aggregation or clustering phenomenon. And again, we also do the visualization by checking the time evolution of the overlap metrics. At the beginning, it's all at the same state. So overlap is uniform. After the time evolution induced by the data-dependent Hamiltonian, we finished the embedding and the embedded data also shows clustering behavior. And here are some summary of our binary and binary classification tasks. We can see that increasing the number of qubits can basically boost the performance on both the train and the test accuracy. So as a summary, we proposed an analog quantum variational embedding classifier using an analog quantum computer instead of a gate-based quantum computer. And the non-linearity in our classifier is completely from the quantum part. And we also extended the variational classifier to multi-class case. And we tested our classifier on linearly inseparable data sets, such as concentric circles, spirals, and nest digits. And the performance of our classifier can be improved by increasing the number of qubits. And as a lot look, we also tested some other Hamiltonian form, such as XX plus YY interaction Hamiltonian. You can find the details in some poster that will be presented by another colleague from our group. And probably we can also try to experimentally realize it, at least for a single cube of the case. That's probably the easiest to do. This work is supported by DARPA. Okay, that's my presentation. Any questions? Thank you for the talk. We can move to the question from the audience. Yes. I didn't completely follow the design of the algorithm you implemented using the annealer. But could you explain the motivation for why you think or do you think that it has the potential to be better than the algorithms for the classical computers in some respect? The nonlinearity part will be not classically simulable for a larger system. So that's the main argument. It could provide some nonlinearity that's beyond the catch of a classical system. But whether it will be better than classical classifiers, that's an open question. So the nonlinearity is hard to simulate classically, but there is... So in order for the algorithm to have a better performance, the nonlinearity should not only be hard to simulate classically, but also help the specific way of the nonlinearity should help in the generalization or with the performance of the algorithm. Yeah, that is the point that maybe a large quantum system can have better expressivity, as some may be clues from a tensor network or something like that. The quantum system has the potential to have a better expressivity. Okay, other questions? There's a question in the chat. Can you read the question? The question is, is there some obstacles to testing this approach using a quantum annealing device instead of an analog quantum simulator? Actually, the Hamiltonian setting we use here is indeed a quantum annealing device. It anneals from the initial transverse Hamiltonian to some final longitudinal Hamiltonian. It's exactly a quantum annealer. So there will be no fundamental difficulty to try it on an annealing device. And since annealer is basically running in some continuous mode, so I call it the analog quantum computer. It's some knowledge I think appropriate. Okay, was there some other question from the audience? Yeah, thank you for the talk. It was certainly an interesting approach. I have two questions that are kind of related. Namely, on slide 14, you show the performances of the classifier over the number of cables I think you used. It seems like that for the three-level problem, it seems that the performance kind of seems to converge to a value that's lower than one. Did you try for larger systems and eventually went up to one, or does it remain at that level? It is already very time consuming to do. We need a bigger computer and a better numerical simulation code to push further. Okay, so the second kind of related question, if I may. Most universality proofs for neural networks that I'm aware of or even all of them, they kind of rely on the trainability of the weights that you apply after applying the nonlinearity. So do you have any trainable parameters in your approach that you use after the nonlinearity or only before the nonlinearity? I don't quite get your question. Could you explain it again? Yeah, so in the neural network, basically wherever you draw an arrow, you have like a trainable weight, right? Yeah, yeah. Our weights are all in this part. Exactly. So do you have also, where you drew the arrow that you labeled as Hamiltonian, do you also have any trainable weights in that part of the algorithm? The Hamiltonian part, we don't have variational parameters to train on. All the parameters are just defined by this transformation here. Convert the data into some schedule parameters. So all the weights are here. We use a very simple setting. Yeah, of course you can add more variability features into these settings. Yeah, but we just choose the simplest possible case. All right, thank you. Any other questions? Okay. Another, yeah? Oh, okay. So, thank you again. You're welcome. So have you thought about any, if that's a universal classifier, should it always work or is it kind of like an algorithm that seems to be working from your numerical experiments but like mathematically we haven't analyzed it yet? Oh, going from numerical experiments to computational science, maybe the computational complexity, I think it's a big leap. Yeah, we don't have, we haven't touched that part yet. All right. Thanks. And the other fairly brief question is, I think I was on slide five where you explained actually Seth Lloyds, I'm not sure if I can ask you about that. They use like this ResNet part and like for the feature extraction, I assume. And do you know though if those, if in their approach, if they freeze those weights or do they train those as well? They use some pre-trained ResNet but making this part changeable, variable, yeah. Okay, cool. Thank you. If there are not more any questions, I'll thank again Dr. Yang for his talk.