 Hello everyone. This is going to be the afternoon session in which Konstantinos Mejhanetsidis will talk about QNLP in more depth and it will cover the experiments that we've realized and what we're working on at the moment. This will be followed by a session, an interactive session by Richie. I was going to give you a demo of Lambeck, our QNLP toolkit and hopefully we'll make this interactive a bit. He's already sent you the notebooks on Slack so you can already download it, I mean focus on what Konstantinos is saying but if you have a bit of time you can download it and install it. All right, the floor is yours. Thank you very much. Can everyone hear me? Yeah. Okay, hello. We will close with experiments on QNLP. I will show you how we realize in practice all of the stuff that Bob was talking about especially and specifically for realizing some quantum natural language processing on actual quantum computers as well as numerical simulations and I will close with the vision for the next year and the experiments we will do a lot of you in a year or so. So, at Konstantinos, we are in the Oxford team as Bob has probably said, a big group focusing on experiments for QNLP and I will guide you through the theory how we build models for QNLP and then at the end I will show you specifically some of the results we have obtained very briefly. If you want more details we can talk later and then at the very end I will show you a specific experiment, only one, basically a blueprint for an experiment but that doesn't mean that I will be the only one, that is only the first step for large-scale QNLP with text circuits. So, let's get started. So, I have here Bob, I have John Firth and Jim Lambeck and each one has contributed a specific idea and if you get these three ideas you get quantum disco models, right? Quantum from Bob, distributional from John Firth who says that you shall know the meaning of a word by the company it keeps, basically the guy was saying create word embeddings from large text by counting how they occur in each other's contexts and Jim Lambeck was saying I love algebra so I will make a quantum, well I will make a grammar model based on algebra and we will see how quantum and algebra play very well together, I'm sure Bob talked about this briefly but here I will show you how we use this to build quantum models specifically. So, John Firth's idea for word embeddings here specifically will mean I will create my word embeddings, I will stake my meanings inside Hilbert space, right? And my model will manipulate these meanings so that I can make a task in NLP work. The compositional aspect is what gives us some sort of science-based, some sort of scientific handle of what's going on in language because I don't want to be doing the mainstream thing, which is to take a bunch of layers together in huge neural networks, train it in an initially stupid task but then it does impressive things like the things you have seen in the news, GPT, whatever. However, what goes out of the window is interpretability and knowing what's going on inside this giant inscrutable matrices of real numbers. So, to tie things together let's start from where Bob left off but I won't dwell too much in this because it has been covered. I will just say that my tools will be boxes and wires as it has been throughout the whole day. I have states, I have effects, I have processes, I have scalars and I also have an operation that allows me to kill wires which is the discard and it's all boxes and wires and always has. So, this is our base for thinking about things and building models. If you remember from the previous talks today, you should remember there's two ways to compose boxes, one is sequential and one is parallel. So, one process after the other and one is at the same time. And by these two ways of composing diagrams and Richie will show you during the demo how you can do this in software with Discopy. With these two ways of composing diagrams, well boxes you can make any bigger diagram composed of small boxes. So, from small processes you can compose them to make big processes and there's special kinds of boxes that if we open up inside we see they're just wiring, for example a swap, the identity, a cup and a cup. Most importantly I want to focus on what happens when you stick cups and cups together to make a snake and Bob has talked about this already. This is like in quantum teleportation but here abstractly it just means that wires can wiggle, right? I don't care about their shape, I just care about what they are connected, like how they are connected. And the big selling point of this is that I want to build models such that the wiring actually means something, right? So that when I look inside my model, when I open it up and I can see what is connected to what, I can at least somehow understand something. Because if you, for example, see these wiring that people draw in big neural network diagrams, these wiring actually don't mean anything. You have to then, you have to train in a big task and then go back and inspect. I don't want to have to do this. I want to have my model be such that its architecture actually means something and here the wiring will mean something. It will basically mean how information is flowing around. In quantum it's quantum information but more generally it's just information, it depends on what type of distributional semantics I'm giving to the models. Today we'll only talk about quantum semantics so the wirings will tell us how quantum information, meaning is flowing around in my model. Linguistic processes. Let's go from word to sentence. I have some words in a sentence. I compose them together according to grammar, like Jim Lambeck was saying. And then I make a sentence, right? I make the meaning of the whole. This is how pregroup grammar composes things. Every word gets some types assigned to it and then the types have some algebra that says n with n to the r, which is its right adjoint, compose together. And this composition I will show with these cups. I, wirings. So my grammar will only be wirings. All of my meaning is inside the word states and grammar is wiring and it just tells me how to compose things. So if I have the meanings of the words, I compose stuff according to grammar, I get the meaning of the whole sentence. And that meaning is flowing on the s-type wire. The first obvious model native NLP task you can do is check how similar two sentences are. And this is how you do this. You take one sentence, sun melts gelato, it's a state, it's a sentence state. You take an effect, which is some other sentence upside down, upside down stuff are effects. And that other sentence is ego, like what is it? Monk dissolves ego, right? Maybe melts is similar to dissolves, but the nouns are quite different. So this whole thing will have some overlap, but I cannot evaluate my overlap yet. This is just the shape of it. This is a blueprint for me to build my model yet later. I haven't showed you that yet. And then if you want to go beyond sentence level, if you want to go to text, because there is this rock star who I follow on Twitter who says that it's the future, and I believe him. So if you compose sentences, you will get texts. How do you compose sentences? Well, with Disco Cirque. What does Disco Cirque do? Disco Cirque says, I don't believe it's sentence space. Actually, sentence space is basically a bunch of nouns if you look inside. So Disco Cirque says nouns are first class citizens. You have some, you can initialize a bunch of meaning vacua. Let's call them like this. And then every one of these are distinguishable. So to every one of these wires, a specific noun is corresponding. Then all of these nouns, they go into some text process as we had before for a sentence, but now it's for a whole text. There is a grammar that we are developing in Oxford, but it's for a text level grammar, not only for sentence. And then the nouns after they get modified by the text process, they exit modified. It's a very dynamic thing. Now, let us see how these text process is. What is it composed of? The most basic thing is let's make some noun states, right? I have this meaning vacuum and then there is a noun preparation box. This prepares noun states. So basically, you have the meaning vacuum. You have a noun preparation box and this prepares a noun state for this noun. Sorry for the squeaks. Okay, this is how I make noun states. They are initial states. They enter a text. So now I have order one processes. These are boxes. They are not states. They are processes, right? If you remember before at the sentence level, I had every word be only states. Everything was zero order and then grammar was telling you how to compose. Here, I'm making distinctions. I said nouns are first-class citizens. That makes them states zero order. If I go to other types of speech like adjectives or verbs, they will be processes that modify the lower order stuff, which is nouns. An adjective modifies a noun, right? So if I have, for example, an adjective, right, you will ask, oh my God, sorry. Let's see if this helps. I will have some noun entering and this will give me out a modified noun. For example, car, red, okay. This is a state of red car. The intuition here, it's basically the point is that we are following our intuition to build models and it's not just our intuition. It's not just loose things. All of this stuff is based on formal linguistics and there's a lot of formal language theory behind it that formulates it in a way that it generalizes. Now, let's see how a verb would look like. Say you have a verb like lobs, standard examples. You have dog of human. Dog and human enter lobs. Dog loves human, right? So this is the combined state for the phrase dog loves human, right? So this thing, this lobs box is coupling them together and then they are not separable anymore. They entered as two independent things. They got interacted by some verb and then now they are in combined state that state that if it was a quantum state that would tell you it's entangled. That's exactly what I'm going to tell you later. But now the pictures tell you exactly the same story without me having to tell you anything about quantum. So you see it's a feature of the model for language in the abstract. Okay, this is order one stuff. Order two stuff. Yes, the order matters of course. It matters if dog is here and human is here or the other way around. Of course, we also love our dogs but sometimes love is not two ways to read and you have drama, right? Okay. Now I'm going to, yes, that's right. But the places in which I have inputs and outputs are distinguishable. I mean, I'm not drawing my boxes like this, right? I'm not drawing it like this. They're not like the spiders of ZX, right? This is different from this. This box is different. Now I will go only one order higher which is second order. Let's think of an example here. An example would be an adverb, right? An adverb is a modifier of a modifier, right? An adverb modifies a verb and verb modifies noun. So for example, I can have quickly and I can have runs. Okay, this is quickly runs. The higher order thing modifies the lower order thing. This is the point of going order two. There are also order three things sometimes but I'm not going to get there today. Ask me later. And now if you have a text, like a story, this is my simplified version of the matrix. Best movie ever. This is the script of the movie, right? You have the nouns that go in, Neo, Morpheus, Trinity, Matrix, Kung Fu, right? And this is how it goes. Like the verbs become boxes that modify nouns. You see Neo. What do I have? Morpheus finds Neo. I have then Neo goes into quickly exits matrix. Then I have Trinity loves Neo and so on, right? So you have the characters with their initial states. There is a story that happens to them and then they exit modified. By the way, our team is developing a tool that does this in an automatic manner for a large fragment of English. It's basically all based on CCG parsers. They exist. They are trained on huge CCG three banks and then our team has figured out a way to use coreference resolution to see what nouns are the same between different sentences and then you string them all together into a big text diagram. So this is not just drawings. I mean, they are my drawings, but you can see how it works also in software and Richie can show you some of this later, but I don't think we can make it available yet, but coming soon. So you see the bigger the story, the bigger the text diagram. So I'm not going to draw you huge text diagrams, but I really want you to remember that this is how it's going to be. My scaling parameter for my problem size here is the text diagram. This is going to be important later for when I talk about advantage and why we want to go quantum even. So the biggest story, the bigger the text. So the wider means more nouns and the deeper means more stuff in the story happens, right? Nouns, adjectives, verbs and stuff like that. One model native task, text similarity. You take two texts, you want to see how similar the stories are. There are texts that belong, there are nouns that belong to the that are common to the two stories. There are nouns that belong only to the one text and there are nouns that belong only to the other texts. All of these sets of nouns create a noun universe. Okay. I put all of the nouns side by side, intentional product. The nouns relevant to text one enter that text. The nouns relevant to text two enter the other text from the bottom. Text two has become upside down because it's an effect, right? And then if you take this overlap, this should have the meaning of text-text similarity in the same way as sentence-sentence similarity worked. It's just the same generalization of the same idea. So compositionality allows us to take small ideas and generalize them to larger contexts without thinking more. That's the point. Another model native task that is not global. It's not taking all of the nouns in the universe and doing something with them. This is a local task. I discard all of the other nouns that exit from a text. I don't care about them. I just care, for example, to check how much the noun at the position I, what is its similarity with its past self after it exited. So this is like, I gave it a cute name, quantifying the character arc, right? And if you take this idea of throwing out all of the nouns you don't care about and doing something local with one or some of the nouns, you can do something which is actually done in the real world. There exists real world data out there for this task. This is called question answering. So what do you do? You take a context text that says a bunch of facts that that's text T1. One there shouldn't be there. Just T, okay? You discard all of the nouns that are not relevant to the question. You take the question, you make it into affirmative statement, you stick it as an effect. So basically you do text similarity between the text and the question, but the question acts locally as an effect. So you can throw away all of the other nouns. Now imagine my context text scales. My question will not also scale most likely. Usually questions are small. Knowledge and context can scale, but usually you ask questions of some finite amount or like finite support here to be precise. So I have local question and then you should imagine that the text grows in size if my context text grows. So now let's finally go quantum. All of this was abstract for a model building. I started half past, right? Okay. When we go quantum, we want to respect the tensor products. If I want to ask a question about an attribute, how does this work? So if I want to, for example, I have a red car, whatever second order thing. And if I say give me the color of the car, that's quite tough. Yes, you can say car is red. It's a statement. Stick it as an effect to red and car. So red is an adjective. Well, let me think. Car is red. Yes, car is red will be a legit text diagram. You can flip it upside down, dagger it. Now it's an effect. So car will exit the text that says whatever the car is. You stick the car is red as an affirmative, as an effect. And if you have a high overlap, you're confident that it's red. Okay. If the context text said car is blue and then you stick car is red, you should have low overlap. This is the idea. Okay. Any other questions before I go to model building? Yes. Need the microphone. Otherwise, it's not going to be recorded. So when you mentioned the overlap between the entry in state and the effect, is it about, I mean, you have to define it quantitatively, right? So what is the overlap between red and blue or happy and sad? Because otherwise, everything is or to normal if it's not equal. So there should be some metric, I guess. This is the point of going to model building. So I haven't said anything about how you make anything quantitative, right? This has been all abstract. And it has nothing to do. It's only the compositional part. I have been only talking so far about the compositional part. But this framework is called this core, distributional compositional. The distributional aspect is exactly what you're saying is making everything quantitative, right? Now we can give distributional semantics. We can decide what are the spaces on which the states are defined. What are the state spaces that the processes modify, right? We will decide that we want them to be quantum, Hilbert space, and quantum processes, right? States will be defined in quantum. Hilbert space, processes will be quantum maps. And so you can get quantitative results by doing exactly the same way that you get quantitative results if you do anything in quantum. You take overlaps of states, you measure operators, stuff like that. Okay. Now, why do I have a big picture of the tensor product here? It's because quantum theory inherently is built on the tensor product. Systems compose according to the tensor product. And I want that because my grammar theory, my boxes and wires that describe my grammar model, the compositional part, are also behaving as if they compose with tensor products. Because basically I have non-separability when I compose things. This is what Bob is calling shredding or compositionality, or quantum compositionality. I'll just call it compositionality. So now, tensor product before, with the diagrams, means just things happen in parallel. When you give the model semantics, tensor product will be chronicle product. The usual, so if you are simulating these things with linear algebra, because linear algebra is how you simulate quantum theory on your laptop or by hand or on a blackboard, it's the chronicle product, the usual outer product, the chronicle product of two matrices or vectors or tensors or whatever. Quantum theory, if you have a quantum computer and you put quantum systems together in a controlled setup, they will just compose by themselves with a tensor product. So yes, the usual tensor product. Here I have, I'm really beating the dead horse to show you that tensor product is flowing around. So this F is what I'm calling my semantic functor. Jargon aside, it's basically a map that does, okay, there is a mistake here. I'm sorry about this, but okay, it's a map that does box wise substitution of the boxes and the wires, right? So every wire gets assigned some qubits. So for example, we had before at sentence level that every word gets a type. The noun type will get QN qubits. The sentence type will get QS qubits. So every wire carries some number of qubits and it's up to us to decide how many they are. The number of qubits decides how big the Hilbert space dimension is that flows along this wire. If I have Q qubits on a wire, the Hilbert space is two to the Q dimension, right? Everyone knows that. My mistake in this picture is that I have bent the wire on the right to make it an effect, but I didn't do the same on the right hand side, negligence on my part, but you will understand from context in the next slide what's going on. Now, I want you to focus on sun that goes into melts, right? Sun becomes a state preparation circuit unitary. So it's a circuit, quantum circuit U that is parameterized by a control parameter set theta and I have subscript s. So theta s is theta for sun, okay? So sun gets its own parameters that go into a quantum circuit and they prepare a quantum state for the word sun. The same happens for melts. There is a theta melts that goes in a parameterized quantum circuit and prepares a state for melts. And now after you post-select all of these, okay, let me show you here. So the cup becomes a c-note and then post-selections. This you can prove with the rules of ZX. I mean, it's trivial. You just do this. You have a cup like in Bob's lectures, you say I'm going to put an identity spider here. It doesn't mean anything. Oh, by the way, I can grow a one-legged spider from either with confusion. And then, oh, look, this is a c-note gate, right? So a cup is basically a c-note followed by post-selection on the plus state and the zero state, right? So now that you are experts on ZX, this should be obvious to you. So the cups become c-note and post-selection. If you bend stuff around, which you can, but you have to be careful that you're bending with 180 rotations. So that is not dagger. That is transposition. And it's easy to make this mistake when you write a code, but you have to automate it once and then forget about it. But this is crucial. This is transposition when you bend states to effects in disco cut, right, at the sentence level. And now, if you have two sentence states, two sentence quantum states, you can, of course, take their overlap. This comes back to the question before, how do you quantify this? You can take the overlap with two quantum states, right? You can use whatever protocol you like. You can use the swap test or something. There are standard things to do this. So you have the quantum state of one sentence, the dagger of the quantum state of the other sentence. And this is a valid quantum operation. You can evaluate it on any quantum computer you like. I haven't told you how, what these fitas are, that embed my meanings. I will tell you about it later. And then when we go to text level, which is the most interesting thing, again, let us see how we replace, how I make my circuit components for all of the all of the parts of speech according to their order, right? I said I have zero order one, order one, order two things. So the order one things are the nouns. So what happens to the nouns? Well, I will choose a convention for my meaning vacuum. It's going to be the zero state, right? All of the wires in disco search carry noun types, right? We said nouns are first class citizens nouns flow around and stuff happens to them. So all of the wires will have the same number of qubits, right? Because they're all the same type. So qn, I choose it once. It's the dimension of the wire of the hyperspace flowing around my wires. And you see a state a noun preparation circuit will be this u theta n and this theta n depends on the noun n, right? So DOB will have its own parameter set house with have its own parameter set will be different and so on. And this thing prepares a state, right? So for example, you have theta of car. This state is prepared by u theta car hitting the vacuum which I chose by convention to be the zero state. And that's without loss of generality, right? So if you go to the one order things, they will be, well, the order one things are easy. I'll just make everything be unity. This is a choice. I can make them be generic quantum maps by adding uncillas and discarding them. But let's not go there. It's everything I will say from now on will still hold if I make these more general quantum maps. So let's stay with everything being unitary. So whenever I see a box, I will make it a unitary box. And you see whenever I have adjectives or verbs, I will always respect. You never have this situation where you have more wires that come in a box than exit a box. There is a conservation of nouns, right? As many nouns enter a text, as many nouns exit the text. So it's fine to have everything unitary because unitaries need to have the same input and output dimensions, right? So if I have some adjective, for example, I'll give it a unitary and it will be parameterized by some parameter set specific to that object. Now the higher-order thing is a bit strange. One choice is to say I'll break it like this, like a sandwich. I have this comb thing that is a higher-order box, like for an adverb. I will break it into unitaries that sandwich something. So let me show you what I mean. So if I have a quickly runs, as was our example before, I will have u theta quickly, some parameter set with index one, and there will be some other second parameter set for quickly. So quickly has two parameters sets. And whenever I have something like runs, so u theta run is just sandwiched by these two guys, right? So these two guys for quickly sandwiching runs will return to me a circuit that is supposed to be for quickly runs, the whole thing. Of course, I can simplify my life. I can say from order two and above. I don't want things to be quantum processes. I don't want everything to be quantum processes all the way to highest orders. But also, you know, you won't have infinite order stuff. This is usually in language, everything stops to order three or four, right? I mean, how often do you have modifier of modifier of modifier of modifier? It's kind of insane. So what I can do is say, I don't want this to be quantum maps. Another choice is to have, say, so I have here for, for runs, I have runs, and I can have a classical control for quickly that modifies the parameters of runs, right? So this can be a classical function like a neural network parameterized by its own parameters for quickly. And here its input is theta runs. So theta runs enters quickly and is modified and enters here. So that thing is total is together quickly runs. I mean, also this makes sense, right? Quickly just modifies the runs, right? It's, it's like, it's like a knob. Or like very, very will also be higher order, very red. I can have like a classical control on, on the circuit that is very and just modifies the parameters of, of red and just have very red or, or like less red, stuff like that. But all of these are just design choices and they're all valid. Okay, similar idea. And now to come back to the, the point of how you do something interesting with quantum tech circuits, quantum tech circuits for question answering, for example. As I said before, I have a context text, it's a big quantum circuit now, the context text. This, and this theta t is all of the parameters, it's, it's the, it's the concatenation of all the parameters sets, all of, of all of the words that involve, that are involved in that text up there in the t text. Theta q, the same theta q is the set of all parameter sets of all of the words that appear in the question q. So I have a big quantum circuit, all of the nouns go in. Yeah. So zero states are initialized on, on q and qubits for each wire that you see here. Every wire is for a noun. They enter a big text circuit like my, my matrix story before the, the nouns that are irrelevant. The, the, the quantum wires that are irrelevant to the question are discarded. I never measure them, right? That's what discarding means. And then I do the dagger circuits of the question text, which has been turned affirmative. It's an effect. And then basically I can measure everything. I can do a bunch of measurements. I have deliberately drawn these with greens because they are the so-called bastard spiders from the, from the Dodo book that, that Alex and Bob wrote, picturing quantum processes. If you want to see how quantum and classical wires interact. First of all, you need to draw two types of wires. The whites are the, the quantum, the greens are the classical. And then there are extra rules, rules of how classical and quantum wires interact. And the, the green dot is basically a Z spider. And, and very conveniently it, it models the coherence. I measurement in a Z basis. So basically this thing tells me measure all of the qubits there. If you measure all of the qubits there, you will get a probability distribution for all of the bit strings, i in the computational basis states that, that are defined by all of the green wires, right? If you go and see the probability of all zeros, this is quantifying the, this overlap. Right, because zeros, zero comes in when you prepare a noun. When you are prepared a noun, you want to see how much zero comes out, right? Because this theta q basically includes the unpreparation of the nouns that are included in theta q. So if I see how much zero is coming out, how much, how often I measure the zero states there, this gives me the overlap of this whole question answering thing. This is very important. Questions about this? Because this is basically the whole setup. You can take this and generalize it to other things. And since I set up the whole thing motivated by wanting my wireings to mean something, I set it up so that it has tensor structure. I could have said, well, forget quantum or I don't know about quantum. Maybe I wanted everything to just be tensors. If I ended up, because the model says that's how you should do question answering, right? This is how the task looks like in the model. This is the native way of doing question answering model, text-text similarity, right? Local text-text similarity. If I had tensor semantics, generic tensor semantics, evaluating this thing becomes exponentially hard if the context text scales, right? Because contracting tensor networks, at least exactly, and also approximately, is hard. You need exponential resources. However, I set everything up so that it is valid as a quantum operation, right? Of course, simulating these quantum operations, you would use tensor networks and linear algebra on a computer basically, which is exponentially hard. Quantum theory, quantum mechanics is hard to simulate. But if you have a quantum computer, this is how you do it, right? Now, if this beats GPT, whatever, it's besides the point here, because we started by a motivation. We started by a motivation to build a model. Then you make it work. It's backwards. It's not make a brute force, all powerful God and throw it at tasks. It's being more of a scientist about things, even if they don't perform amazingly at first. Do you have a question? Yes. So, this type of circuit for verifying the statement is the docked product of two states. So, it's like the fidelity. So, if all zero enter and all zero, I measure in output, it means that the statement is true. But there is, as far as I know, if the system is too large, this process don't work, don't work because there is the catastrophic, catastrophe, or orthogonality catastrophe. So, I think what you're saying is that the bigger the circuits, the more random they will look, the more it will scramble and getting the zero state, it's exponentially unlikely. No, there is also for a noiseless circuit. I'm also talking about noiseless. Noiseless. Okay. Yes. If the system is large enough, the orthogonality catastrophe turns out to falsify the statement, but the reality is true because all the state appears orthogonal. If the system is large enough, I don't know if I express correct myself. Maybe I can ask this question at the end. My understanding about the orthogonality catastrophe is that when you solve the generalized eigenvalue problem, you end up with lots of new dependencies and then your matrix inversion becomes unstable or it's here, you're just solving a variational problem. The basis is naturally orthogonal, I think, here because we're using qubits. Well, before we say anything about variational, I need to go and explain two experimental approaches because the word variational is important there. I think this conversation has some depth. Let's have it later. I think it's basically about how likely it is to even measure zero if the Hilbert space grows, if the Hilbert spaces grow too much for generic circuits U and UT and UQ. It's true that if these circuits are random, like generic, typical, right, and I'm growing the number of green wires on average, yes, it will be exponentially unlikely in the size of the green wires to ever measure zeros. I think this is similar to what you said, but remember what I said before. I don't want to be growing the number of green wires. My questions are always local and the scaling parameter of my problem is the context text. The only text that grows here is the T, not the Q. I mean, if what you were saying was a problem, then the whole BQP thing would be a problem. The BQP setup, the whole of how we define quantum Turing machines and decision problems with quantum circuits is like this. You just throw all of the qubits in a circuit that grows and you just measure one. Here I'm not measuring one, I'm measuring five, but I'm not measuring some function. I'm not measuring a number of qubits that is a function of the t-text. I'm always measuring, say, five, and that's it. So the number of sorts you will need is finite for a specific additive error to this probability, P of all zeros, independently of how the other thing scales. So I never said before what is the U of theta. U of theta is some unitary that I said is parametrized. In practice, how do you do this? You pick some circuits that people have studied in quantum machine learning that they like because they are expressive. Expressive means that they explore the Hilbert space on which they are defined quite effectively as if they were random, four random choices of their parameters, their control parameters. Here the parametrized gates are the rotation gates and the control rotation gates. The fitas are not shown, but it should be implied. Hadamard, of course, and CZ are not parametrized, but Rx or control Rx and Ry are parametrized. They have the fitas inside. One choice of U theta is this thing on the left. I've drawn them to go down to be in the spirit of all of these language circuits because everything is read from top to bottom. But usually people write in their papers stuff, you know, quantum processes that go from left to right. So here I've rotated a bit. Sorry about this. But these are actually the circuits that we do use in our experiments. Another choice is the thing on the right. It's three layers of that block, which is a layer of Hadamard and a layer of control Z and a layer of Rx and then I repeat this three times. How many layers of this block I choose is my choice. It's a hyperparameter. How thick these things are is my Qn. It's a hyperparameter again. I choose how many qubits I want to assign to every wire. So I said I'm going to talk about some approaches of how we do experiments. One approach is train everything in task, build my text circuit, leave all the fitas free, pick a task. My task defines a cost function. No, I'm doing quantum machine learning. However, I didn't pick some random ones that I found somewhere. My text circuit is informed by the problem. The problem here is language, the structure that the circuit inherits from the problem is syntactic structure, right? The whole structure of the diagram. It's not just some random circuit, black box. You make a cost function for your task, you evaluate performance. One of the things we do is binary classification, the easiest task you can think of. You have a bunch of sentences. Each sentence gets its own sentence circuit, right? Most of the words will appear in more than one sentence, which allows the thing to even train. Then you can do supervised learning. You have a test train set and you have a test set. You keep the test set aside and you train the fitas. Such that the quantum circuits predict the label. Every sentence will have a label, right? Zero or one. This is a pregroup diagram like before, right? If you measure the qubit there at the sentence wire, you will get some probability that it's zero and some probability that it's one. Zero for one class, one label, one for the other. You train the fitas such that you measure the correct labels, i.e. the labels that the train set says that these sentences have. After you have trained, you evaluate with these parameters for the words after they have been trained in task. You execute the circuits in the test set and you see your accuracy, like how well the model generalizes in unseen data. Because if they're unseen, because the test set, you keep it aside. This is all this variational loop here, right? The train set sentences become quantum circuits, as I showed you. They go to some quantum processor, you measure out the class label, their probabilities. If you're not happy with them matching the train set labels, you iterate, you update your parameters with some optimizer, the fitas, and then you loop around until you're happy, right? And then you just evaluate on the test set again on a quantum computer. You can do this with this co-cut diagrams at sentence level. I'm going to do a parenthesis and show you what we are working on now. Because before, with this co-cut, we have these two experiments published in these two papers, but they're toy experiments, as in the data is artificial. It's 100 or 300 sentences, not that big. It's toy, right? For NLP standards, the vocabulary is very small, but it works. It was a proof of concept. Now, if you change it and you don't use Lambik's pregroups, if you use just CCG trees, you don't need to post-select anymore. You can just have three structures. They're easier to train on quantum computer. There are no barren plateaus. I'm not going to stay too much on this, but it's working progress, and it's very exciting. So what we do is we can tackle thousands and 10,000 of sentences they can be movie reviews, they can be clickbait news titles, or even DNA sequences. If you don't have syntactic information, you can just use this middle thing, which is a binary tree with disentanglers, which is inspired by condensed matter theory for critical systems. And then if you combine these two ideas, syntactic information with this entanglement filtering thing from condensed matter, from if you have heard of them, they're called meta tensor networks or quantum convolutional neural networks. We can combine these two and make syntactic convolutional neural networks and so on. And all of these things work very, very nicely. And soon we'll run them on actual machines as well, and then there will be a paper. So stay tuned. And now when we go to Disco Cirque, this is the interesting thing that I want to close on, because it's something that we have also seen working in task. So I will tell you exactly what I mean now. So instead of training in task, what you can do is, instead of doing quantum machine learning, like I said, instead of doing supervised learning, what you can do is say, I want to do question answering. Okay, but I don't want to train my Thetas in task. Okay, I will pre-train my Thetas. To pre-train your Thetas, you need to do basically what classical NLP does, which is make some word embeddings that are pre-trained, task agnostic. The usual methods you might have heard of are called word-to-vec or globe, things like that. We can exactly use these methods. Specifically, I can talk about globe. You don't need to know too much now. I can show you what the cost function looks like. We can ask me later. The point is that you can pre-train stuff. You can pre-train the component circuits that go in a big text, and then you trust the compositionality, such that when you stick them together in a big text diagram, which is hard to evaluate classically, the meanings compose in a meaningful way. Like they don't generate gibberish. That's why I say trusting compositionality. You trust in the mode. But of course, this is not blind trust. What we are doing is we pre-train on some corpus, like there is a kid's book corpus that we found. It's very nice because it has a huge vocabulary, but the sentences are small. The text are small. You can take paragraphs, and you can pre-train in Corcoran's matrices. What you can do then is take a question-answering task, which your pre-training had nothing to do with, and use this as a test of compositionality. This is something we are working on now. The most important thing I want to say now is that when we pre-train, we can do this classically because we train small components, and I can simulate evaluating small components of a big thing. But if I compose them together to make a big quantum circuit, then I cannot evaluate it anymore. Then I do need a quantum computer. This is the global cost function. Basically, what you do is the important part of this cost function is this Dij. So for word i and word j, I have a similarity measure, Dij. This Dij, I can evaluate classically. It's the overlap, for example, of two noun states, or the overlap between two adjective processes, or the overlap between an adverb and another adverb. Or I can even compare things that are dissimilar, an adverb with an adjective, or an adjective with a noun. I can take all of these overlaps by just sticking them together compositionally and then replacing their quantum circuits. But these overlaps are small. I mean, I can simulate 20 qubits on my laptop, right? It's fine. I can train these. And I want to train such that their overlaps match the other important quantity in this equation, which is this minus log xij. This xij is a huge matrix, a co-occurrence matrix that you gather from a huge corpus. Like I said, it's a kid's book or data set, or even you can do it from Wikipedia. You can take all of Wikipedia and you can count. xij is basically the frequency that word i appears in the context of word j. That's it. You want to make this basically realizing John Firth's quote inside Hilbert space, right? I'm making my quantum processes have overlap such that their overlap is proportional to their minus log likelihood for them being in the same context in the corpus. Hey, you pick a window. Yeah, it doesn't matter. Yeah, these are details of how you train. So people usually fiddle about with these hyperparameters and they just find something that works. This is this black magic machine learning, right? And here we also use the black magic approach. You just fiddle about and see what works and then you don't care as long as it works because you just care that these overlaps are representative of their co-occurrence. Now that I trained my quantum states and processes to actually be quantum word embeddings, now what I can do with them is stick them in a question answering task. There are data sets for question answering out there. We take them. We take the context text that says a bunch of facts. We get a text diagram that builds us a text circuit. I just replace my pre-trained word embeddings. Now I have a big text circuit. I cannot evaluate it. I need a quantum computer for it. The same I do for my question. And then I just measure on a quantum computer as a test about whether compositionality actually works, right? So this is what I mean. It's not trained in task. If, however, the whole thing is small because there exists very simplified toy question answering tasks, you can also train in task everything. Don't do any pre-training. Create your circuits like this. For every different question you get a different circuit to run. You can train all of these in tasks such that the correct answer comes out, right? You can do this. We have done this. It works. It works even with one qubit per wire. It was like a WTF moment. It was pretty cool. So what you get here is you need exponential resources to do this classically if the thing is big. On a quantum computer, you can do it in polynomial time. I will be bold enough to say that this is an exponential advantage against simulating the model classically, right? And then on top of that, let me go back. Let me go back and remind you that this thing, when you take sentence overlap, there is a very nice paper by Wilson and Bob, which started this whole QNLP business. And there they said, if I have two vectors to compare and I want to see which effect has the highest overlap with my vector, I can invoke the closest vector algorithm, quantum algorithm, which basically is Grover under the hood. You Grover search over the possible effects and you get a quadratic speedup. Okay. It was cool in 2016. Today, not many people care about quadratic speedups because they believe that the error correction overhead and all of this stuff will be quadratic. So they will kind of kill each other off. However, in my very bold statement that I have an exponential speedup, when I do question answering a text level with quantum disco serve, I can also have my extra bonus quadratic speedup on top because I can also Grover over the answers. And this is something that we are putting down in a theory paper now and stay tuned. And yeah, now I think Richie will give you a very nice demo of Lambeck, which is about sentence level QNLP that automates all of the experiments I talked about. And you can reproduce all of our papers. And he will show you also how you compose diagrams to make text circuits. And yes, have you joined the Zoom? I sent it. Hello. Can everyone hear me? Yeah, hi. I'm going to talk about QNLP using diagrammatic software. So this demo is going to be roughly two parts. And the first half, you're going to see how we build, manipulate minodal diagrams, IEDs like diagrams in disco pi, which is a library that we've developed. And we'll talk about things at a higher level and how we convert things to quantum circuits. And then later on, we'll talk more specifically about QNLP and that will be using Lambeck, which is another software package we've developed. So you have the notebooks in the channels. If you want to follow along and run the cells, you can run them in advance. So yeah. So in disco pi, you can build diagrams using boxes of wires, as we've talked about today. And these boxes of wires have types. So you first define the types of the wires by importing type. In disco pi, you define these types. And once you have that, you can have identity wires. So if you can tensor these two wires together, food and water, you get this diagram with two wires on it. So this is how you compose two diagrams together using the at operation. So corresponds to the tensor product. You can do more things with it. You can draw more arbitrary string diagrams. So when you have a swap here, you're swapping, you have two systems, one that contains water, one that contains energy, and you swap it. And now you're working a symmetric monodal category. So this is a monodal category equipped with this swap operation. To define a box, you need to give it's input type and output type. In this case, we have a human box, which is a process that takes in food and water and converts it to energy. And the way you construct this is you take, you import the class box from disco pi, you give the box a name, and then you give it input and output type. So it's very straightforward. You can draw it like this. What's after this is you can compose things together. Oh yeah. So let's say pasta is a type of food and wine is a very impure form of water. And you can tensor it together again. Here, this is an example of like you have an empty type. So this is a box that takes in no wires at the top. This is how you would define a disco pi. So here you go. You have pasta and wine, and you can put pasta and wine into human. So when you compose this diagram, so you do pasta at wine, followed by human, you get this following diagram and they're all type checks. So if it doesn't type check, you'll get an error that the types don't compose. So you can see using this kind of syntax, you can define arbitrary string diagrams for anything. And you can, it's a bit tedious, but you can do it for arbitrary large diagrams by hand, using this syntax. You can flip diagrams. So like we talked about how we work in a dagger category, maybe, and if you have a human, you can take a dagger of a human, you flip it upside down, and now you can, you see these two boxes can be joined together. And that would be like you dagger you in the quantum context, but this is right now it's a bit abstract. We're talking about humans and food and water. So here's another concept we've talked about, like functors. So functors are a type of like recursive way of mapping one diagram to another diagram, maybe perhaps in a different category. And it's a functor because it satisfies a property called functor reality. And what that means is a structure preserving mapping. And all that really means is you open up each of these boxes within the diagram and replace it with a sub diagram that respects the types. So the way you do this is you first give a mapping on the objects. In this example, I'm going to give a doubling mapping. So given whatever type, I'm going to map it to itself twice. So food becomes food and food. And then water becomes water and water and the energy becomes energy and energy. And here I give a mapping for the boxes and I'm going to map them into sub diagrams. And so the pasta state becomes two pasta and then the wine state becomes two wines. And for human, you want to, everything's composed together, right? So you need to add some swaps at the beginning to make things composed. And once you have this, you can convert any diagram consisting of just pasta wine and human. You can keep applying this functor. So once I define this functor, I can now turn this small diagram here into a larger diagram. So now there's two people having dinner. So this is kind of a way you can recursively modularily build diagrammatic software. So this is one example of functor you go from abstract diagram to abstract diagram. But now perhaps you want to embed some meaning into this diagram. So let's say I've defined this mapping. I'm going to turn each thing into two qubits. I want to model this as a two qubit system. Every food is a two qubit system. Every water is a qubit system. And then I fill in each bucket with some sort of quantum circuit, which I've defined here. It's not super important how I defined it. But you see you can, you can build more logic on top of this software. This is like NumPy before diagrams. You can do any kind of diagrammatic thing you want with this software. So I've defined another functor. And now the smaller diagram, pasta wine, human becomes this quantum circuit. Like roughly speaking, this is the pasta. I'm modeling with this quantum state. Wine is this quantum state. And then human is this quantum state that takes four qubits in, alphas two qubits out, post-selects on two of the qubits. You can put other things in, but this is just a simple example. But once I've done this, I can combine these two functions together. So I can take the original diagram, make it doubled, and then apply the quantum function and now have a larger quantum circuit. So as long as I know what to do with the smaller boxes, I can convert any larger diagram that consists of a free composition of these boxes. So now you get an even more complex circuit and it's very little code to do this. You don't want to build these circuits by hand every time. That's what I'm saying. You can think in this high-level diagrammatic schema. And once you have these circuits, you can further convert them. So this is another functor with disco pi comes with a functor that converts this quantum circuits to ZX diagrams. And specifically we map this to physics, which is a library for ZX diagrams that you can, it's pretty cool. You can drag things around. You can apply rewrites. That's actually one of the challenges we can for the ZX team. So you can try to extend this piece of software to do more interesting rewrites by hand on software. Here's another thing you can do. Disco Pi supports converting quantum circuits to ticket format, which supports all sorts of quantum circuit back in. So from disco pi, you can evaluate circuits from different machine vendors, which I think is quite cool. So from having dinner, we suddenly have this kind of quantum circuit. And here's kind of its representation. Nice. Or you might want to just simulate it directly on your machine. So this is what you can do. You can leverage tensor network libraries to advance age. So disco pi integrates well with tensor network libraries. And on top of it, we build our own density machine simulator, which works very fast. This isn't your average state vector simulator, which struggles at 20 qubits. If you had some sort of large circuit, which had 50 qubits, but the entanglement between the systems is relatively low, and it has some sort of tree-like structure, you can do up to 50, 100 qubits. As long as you only care about a couple qubits at the end, it knows how to contract it in a clever order. So it's really useful for our experiments. As you see, we use grammar as a tree and there's local entanglement. So we can actually simulate much larger systems than you if you use kiscuit or something like that. But yeah, this is the output tensor of, I think, the four qubits, and kind of exactly how many qubits there were. But yeah, as a quick recap, let's say we have some sort of process that we model using this kind of diagrammatic notation, and we can embed meaning using functors by converting this high-level schema into quantum circuits. So if you believe this is how your problem works is how this process can be modeled, and you believe that, say, each bus can be modeled with a quantum state or like some unitary or density matrix or CVTP map, if you think that this is how you could build software in general, right? You have some sort of problem, you build things, you put quantum things into it. So we're going to use this kind of idea to do QNOP. So it's a way to develop high-level diagrammatic schema. So now we're going to move to Lambeck. Lambeck is a QNOP toolkit written in Python. Both of these packages are open source. You can use it and contribute to it if you want. There's a community that's at this core that you can go and ask questions if you want. Yeah, so it helps you develop models for QNOP. Here's one model, for example. You could decide that you have, but here's a sentence, fat cats eat rats. And you might say, I want to combine these words in such a way that I don't care about the ordering of the words. So you can combine it using a spider. As we know, the spiders can use this even if you confuse and unfuse. So yeah, this doesn't capture the ordering of the words. I only care about what's in the sentence. So this, you build the schema, you convert to a quantum circuit. This ends up corresponding to a very old natural language model called a bag of words. So you're essentially animalized multiplying the word embeddings for each word and surprisingly effective. So it's a good baseline. And here's another example. You might want to care about ordering the words and you kind of read in the words, these tokens from left to right. And this kind of resembles the kind of architecture of a recursive neural network. You have a starting token and then you're actually repeating the insert words into it. And these y's in the middle correspond to hidden. They carry the hidden state of a recursive neural network, recurrent neural network, sorry. So as you can see, you don't necessarily have to put quantum circuits in. You could, if you don't have cups and caps, you actually just live in normal monoidal category and are compact closed monoidal category. So you could put all sorts of interesting things for your semantics. Yeah, so you put a quantum circuit and you get, you get QRNM, which is nice. We are interesting other things. We're interested in grammatical models. We think, we think that we want to do, we want to model these problems based on the structure and we think the structure may be perhaps come from, comes from the grammatical structure of the sentence. So you, if you model, want to find a meaning of a sentence, you might want to combine the words in the way the grammar tells you to. So here's an example, example of another sentence. Cats eat rats and the cat is a noun. Rats is a noun, subject, object. And in middle, you have words known as a transitive verb, which takes an a noun on the right, noun on the left and gives you a sentence. So far, I haven't combined them yet. But as you can see, it's kind of obvious how to combine them. We've done this a lot today already. So we can do this for just a few lines of code, maybe one line even. And here you get a physical cat diagram. Where you get output here as a sentence type, then you know it's a grammatically well-formed sentence. So we know we can actually convert this to a quantum circuit and try to see what it means. It's another example. Here we have fat cats eat rats. Here, fat is an adjective which takes a noun and gives you another noun. So that's why it has this noun, noun, left that joint type. As you can see, the types of the other words aren't changed. And that's kind of a property of the previous grammar that Bob talked about earlier today. It's kind of a lexical grammar. You give the type to the word. And in many cases, you can combine the words in the same way, using the same types. If you've studied formal grammar linguistics, you might come across like constituency grammars. They're kind of similar. You have your nouns, your n's, your nps, your vps, your s. This is kind of similar. But this is for categorograms, which I'll talk a little bit more about later. So another well-formed sentence, that's fine. Of course, it will be completely impractical to manually specify the part of each sentence. But Landbeck comes with this state-of-the-art CCG parser known as Bobcat, which automatically parsers sentences and converses them into these diagrams. So it's a neural, we trained it, and it's a state-of-the-art parser for categorograms, which is quite nice. So here's it in action. Well, I ran it five minutes ago, 20 minutes ago. If you input such a sentence, it will give you the parse tree. And this is what is known as a CCG grammar, a combinatorial categorogram, just another type of categorical grammar. It's quite similar in spirit to what Bob has mentioned today with pre-group types, pre-group grammars. So here's the parse tree, and here's what the parse tree looks like in DiscoPy. And you can functorily map this again. It's another structure, sexually preserving mapping into this DiscoCat diagram that can see brands. So again, same thing, but now it looks like a pre-group grammar. Yep. And now we're ready to do DiscoCat experiments on it. Yep. So I have this corpus of text. It contains a lot of sentences, and you can actually read them in and parse it all at once batched. So this line of code reads in a list of sentences and converts them into a list of diagrams. As you can see, some of these sentences are actually quite long and complex. And you can have a look to see that the grammar kind of makes sense, like there's a noun, plus the bear comes out as a noun, which makes sense. Conjunction types, determiners, so on. That's the sort of thing you're interested in squinting at. This swap is an artifact of the CCG grammar. It's a rule known as cross-composition. I wouldn't worry too much about it. It's out of the context of this talk. But we can now map this to quantum circuits. How do we do this? We use parameter as quantum circuits. So Lambert comes with a bunch of quantum ansets. And here we use the IQP ansets, which consists of a list of Hadamard gates followed by a layer of diagonal gates, and then followed by more Hadamards. And if you replace each box with this kind of circuit, with this kind of sub-circuit, you get an overall circuit that describes your sentence. Here's what it looks like. The original diagram was this, faculty's rats, and then you end up getting this quantum circuit. This makes sense? All right. Cool. You can also send this to tensor networks. Although we primarily, Lambert is a PNLP package, we can also do like classical tensor networks. And that's another interesting way to train these models. So we can convert these diagrams into tensor networks. And each y here is carrying the dimension of the vector space. So you imagine like this is a vector, this is basically a matrix, this is all the free tensor, and the wires correspond to contraction, tensor network contraction. To do that, you can do e-pow on it. So here's another diagram that comes to a rewriting section. So sometimes you want to run things on a quantum computer, but your quantum computer is small, so you can't really put it on there. And it takes too long to run, so on and so forth. What could you do? You could apply rewrites to it, right? So I've talked about functions and dyscopy and Lambert so far. You've seen a lot of functions already. You can use functions to reduce the size of your diagram. So how do you do this? In this example, you have cat eat rats on mats. And you see the word on has five wires. This is kind of an order free tensor, all the five tensor. And each vector space, we assign sentence and noun to have a vector space of 100. So that's reasonable, right? And now we want 100 elements to describe its position in higher order space. Then you end up having 100 to the five to describe this tensor, which is way too large. We can't even put down a computer. I mean, we can't do efficient contraction with it. So once you reduce the size of this tensor, so Lambert comes with some rewritings that you could replace it because this is NR type and it's NRR type. And it's basically just carrying the meaning of this cat, these cats. It's going in and out. So what you can model is just having a cat. So replace the word on, which had five wires, and then you place it with a sub diagram. Now three wires plus this cap. And when you put it together, suddenly you have smaller tensors. And you can now remove these cups and caps using the normal form method, which applies the snake equation. So you see here, there's like a really long cap, really long cup and a really small cap here. They form a snake so you can remove it. And now you are using rescue bits. Yeah, that's good. Lambert also acts as a standalone tool for like formal grammars. You can use it to parse things. So if you input, if you just run it as a command on your terminal saying Lambert and you put in a sentence, it will give you a pre-grid diagram. Here's a very nice kind of ASCII Unicode printing code. It's always fun to write this sort of stuff. Yeah, so now that we have a way to turn sentences into parse trees and parse trees into quantum circuits, parameterized quantum circuits, now we can train it as as Konstantinos has described. And it's essentially a classic, a typical supervised learning training loop. And so Lambert also has support for that. What's this doing here? Oh yes, you can do further, you can do further rewritings. You can bend this establish around doing the categorical transpose, which was what Konstantinos was trying to show you earlier. I see like, so the function is really cool. It knows, I've given the, I've given the mapping from of the nouns as states, but you see here established dagger is more like an effect. It's like the dagger of the original state, but I don't need to give a new mapping for it because we know, you know, if I know the mapping of established, then the dagger of established is the dagger of the state which it maps to. So all of these commutative algebraic properties are encoded inside this good point, which is really nice. Oh yeah, you get, you get this really narrow quantum circuit, which would have been like, I don't know, like a dozen qubits and now it's only 40 bits. Cool. Yes. Okay, now back to training the model. So you can first, you build a model by passing it all the diagram you're about to train on. So you pass in the train circuit, the validation circuits. So it knows which symbols appear in the circuit. So as you can see, each word gets filled in with gates and rotations where the rotation values are parameterized. So now this NumPy model is just collecting what symbols are going to appear in your model, and then there's optimization on them using gradient descent. And you define your loss function. Here we choose the binary cross entropy loss function. I think someone was asking me about loss functions earlier. This is one of them. And then we have accuracy function, which just checks that the predicted label, how many of the predicted labels match the actual ground truth in our data set. Once you have the loss function, you have the accuracy function, you have your model filled up with train circuit and validation circuits. You can give it hyper parameters and to the trainer, the model, you can start training. And you get this. So it takes a little while, so I ran ahead. And as you can see, roughly speaking, loss goes down, accuracy goes up. So it works. But if you play more of it, you get better accuracy. And some of you will be doing that this weekend with the QNP projects. So in conclusion, Landbeck School, DiscoPie School, you can pit install it, it's open source. You're welcome to contribute. And if you have any questions, if you have a discord, you can talk to us. I see any questions about Landbeck DiscoPie, compositionality, category theory, QNP in general. So, yeah, thanks to everyone who's worked on this project. And also thanks to Ian, who, half of the small book he made. So thank you. And thanks for listening. Well, I think for this, if the questions are more technical, it's better to take them at the end. And we can call, we can call it a day for today. Yeah, we're done. Thank you very much. Let's go. I haven't timed it. So I actually got this time right somehow. I didn't time at all. It was kind of a last minute. It's a bit, the energy level was low. The energy level could have been higher. It was very good.