 The second part of my lecture are applications of TensorFlow to machine learning. All right, so OK, welcome back for the afternoon part. So as the title says, in this talk, I'm going to be telling you more about applying TensorFlow Networks. And that includes doing calculations with them, meaning specific pieces of calculations. How can they make these pieces more efficient? And then I'll kind of brush on to the important topic, but which we don't have time, unfortunately, just since I'm only giving a few lectures to go away in depth about optimizing and obtaining TensorFlow Networks. But I'll say some things about it, hopefully just to get you started down that road. And I'll give you some resources you can look at for more details. And then the last part of the lecture today, I'm actually going to step outside of TensorFlow Networks even outside of physics for a little while and just do a kind of pretty broad introduction to machine learning so that we can have that background for tomorrow. So hopefully that'll just be interesting to all of you anyway, even if you're planning to use machine learning, say, not involving TensorFlow Networks just for something you want to do in your own projects or in physics. OK, so let's get started. So the first part of the talk is actually about computations of the matrix product states, meaning what are some of the technical manipulations we can do with them, and what are the costs of those? And hopefully it helps you to see the benefits of this idea. So far, all I've really told you is that they compress a wave function, meaning that a full wave function would have 2 to the n or 4 to the n parameters. And you can get that number down. That's good. However, a key point is not just parameter counting, but how easy is it to access those parameters and do manipulations with them? Because there's other forms of wave functions that are very good at compressing. So there's restricted Boltzmann machines or these neural net wave functions. There's correlator product states. There's Slater, Jastrow wave functions. There's all these other forms of wave functions that also compress the wave function quite a bit. So what are some specific advantages that say matrix product states and then also tensor networks have? OK, so let's get into that a little bit. So here you'll have to bear with me that imagine we already got a tensor network from somewhere. So we already did a calculation to obtain one. And actually, so some parts are maybe a little bit about order. I'll tell you in a bit how you can get them. But let's say we already got them, and let's see why we might want to get one in a matrix product state form. So the reason we might want a wave function to be given to us in a matrix product state form is because there's many important steps of calculations that become efficient that otherwise wouldn't be efficient. So most operations with a full wave function just wouldn't be inefficient. Like even storing it would be inefficient. So let's be concrete about it. Let's consider we have a full wave function, like some full mini-body wave function with n sites. So that's this graphical notation for the norm or the overlap of a wave function with itself. Or it could be two different wave functions. So what is that? That's a sum over all two to the n different indices at once. So you just have to sum here in this case of n of 10 indices. You just have to directly sum two to the 10 terms. That's a no-go. And by the way, in machine learning, they have problems of this type. So you might have a neural network, and you say, what's the norm of the neural network? Because sometimes you want that. If the neural network represents a probability, you might want to normalize the probability. You run into the same issue there. However, you don't have such elegant ways to do it there. But for matrix product states and tensor networks, you do have really elegant ways to do it. So let's see what that is. So if you have a matrix product state form of the wave function, is there some more efficient thing you can do besides just summing over all the physical legs? Like here I've written five of them. So instead of doing all two to the five terms, just one after another, is there a better thing you can do? So by the way, you could do it that way. You could say, OK, let me set all of these spin indices that run through the middle here. That's kind of analogous to these lines here. I could just set them to all different values that they take. And for each one, I could do something kind of efficient. Let me just show you how that would go. So this may be giving you some appreciation for why there's an even more efficient thing that we're going to do in a minute. So I could say, OK, let me fix the values, S1, S2, S3, S4. Then for each one of these choices, S1 through S4, imagine those are fixed numbers, just I set those. Then what I can do is I can say, OK, I can recover this amplitude efficiently by just multiplying matrices together. Because I can say, let's be concrete about it. Let's say the pattern was up, down, up, up or something. So then I'm done with those lines. Those lines are fixed. So maybe the up ones I'll shade in. And the down ones I'll leave as a white circle. So fixing those lines had some effect on those tensors. It was like taking a slice of those tensors. So now that's just a chain of matrix multiplications. That's just a vector times a matrix times another matrix times a vector. And that's efficient. I can do that efficiently. I'd still have to do it, though, two to the n times. That's the problem. So even using the efficiency of retrieving one amplitude from the matrix product state doesn't totally save me here. So I have to do something even more clever. So here's the more clever thing you can do. So the clever thing you can do is you can just number the tensors here, like 1 to 10 in this example, being concrete about it. And then you can just think of this as some network I have to contract. And you can even forget that it represented a norm. And you can just contract it efficiently in steps. You can say, OK, let me contract tensor number 1 with tensor number 2. And so I sum over this line between them. And the result is that I get a new tensor that has these two lines sticking up. Then I can stick on tensor number 3 onto that tensor, whatever that blue thing is. And then I contract over this leg. And now these three indices are sticking up. And then keep going like that. Put on tensor number 4. Now I'm back to a thing with two indices. Then I just repeat these two steps over and over and over again. And I'll show you an animation. So the idea is that I only have to figure out how to do these kinds of moves until all the tensors are contracted. So here's a little movie of what that looks like. So basically, I combine those two. Then I merge into that one, then the next one, then the next one, then next one, then next one. So that's the movie. So it's just kind of chomp the thing down from the side, like a little Pac-Man or something. And then that equals a scalar. Why does that equal a scalar at the end? No legs sticking out, right? So that's a scalar. So basically, and we'll talk about the efficiency of that in a second. And then you can do similar things to get other kinds of calculations. So that was the norm. We'll get into the scaling in a second. Another thing that you could do similarly would be an expectation value. So hopefully you see why this is an expectation value. So here I'm taking two copies of the wave function, one on the top, one on the bottom, written in this matrix product state form. I'm summing over all the sites. But in two of the sites, I stuck an operator in there. So this A is some arbitrary two-site operator. It could be pick any operator you want. That could be sigma z times sigma z or something. But it could be some more complicated one. Could be some term from a Hamiltonian. And then I can do similar things where I can bring the left in and bring the right in by kind of combining tensors from the left and from the right. And now all I have to do is contract this diagram, which I can do efficiently. There's some finite cost to doing that. And I can do that efficiently. So what does this cost me? Well, let's start giving some numbers for these things. Let's say that all the legs going through, which in MPS parlance, that's called the bond dimension, because these are like the bonds of the lattice. But you could call this the internal dimension or virtual dimension. Let's call that M. That's like the typical matrix size of these MPS matrices. And then let's say that the physical legs, like the spins or whatever they are, electrons, let's say those run over D values. So D would be 2 for spin 1 half, 3 for spin 1, 4 for the Hubbard model, things like that. So let's say M and D are our numbers here. So then the two operations that we do over and over again when we consider this norm calculation, one of them was take this partial tensor from before and stick one more MPS tensor onto it and get this funny thing with three indices. And that one involves doing a sum over M values while holding these other three indices fixed. So we hold this index to some value, this one, this one. But the thing is there's M different values we can hold this one to, D different values we can hold that one to and M for that one. So it's like having to do a sum over M things, M times D times M times. So it's like I pick 1, 1, 1 for these three, then I sum over M values. Then I pick 2, 1, 1 sum over M values. Then I pick 1, 2, 1 sum over M values. I just do all the combinations of holding these open lines open and then summing over this internal one. And when you work that out, you just basically all the letters that you see you just multiply together. That's the general rule for estimating costs of tensor contractions is every line that's involved write its size and just multiply all those numbers together. The point is you don't double count the ones that are summed over, you just count those ones. So the scaling here is this is an M cubed D cost diagram to contract. Any questions about that? Because that's an important concept to how to know the cost of those things. So that's an M cubed D cost to make that one. The other operation is then when I've already made that funny thing with three indices, I put one more MPS tensor on. So that'd be like, if I've already combined those three together, now I want to stick that one on. Or maybe I've already combined these four together, and I want to stick those five, sorry, and that one on. So that one also is an M cubed D. I look at every line that takes place in this diagram, and I write down the values, like the ranges, I mean. And then I multiply them together M cubed D. Yeah, so you can definitely use Monte Carlo for this kind of thing. So I mentioned to a few people at the break that one interesting kind of future research direction for these algorithms is to hybridize them with Monte Carlo ideas. So if I had more time, I would describe an algorithm I really like that does this for finite temperature systems. It's called NETS, M-A-T-T-S. That's a really nice one. But yeah, that's an area that is definitely deserving of further investigation of doing these steps with Monte Carlo. It's certainly something that can be pursued. So the tricky part is how to get the best of both and not the worst of both. So if you don't make a good combination, you might just get an expensive thing that has big error bars and a terrible sign problem. But if you can harness it well, it can be a winning strategy. The tricky part is that, let's say I sample this part somehow, I think of this as a sum, and then I do it with sampling somehow. That'll have some kind of error bars left over if I sample. I can never take all the samples, otherwise there'd be no advantage in sampling. But then the question is, what is the fate of those error bars? Will the next step that I do control those error bars? Or will it multiply those error bars by some number bigger than one? And will things grow and get out of control? So you have to be doing it in a context where the errors propagate well. That's the key thing to think about, or one of the key things to think about. The other thing is you've got to keep the sign problem from roaring back in. So the sign problem can creep up on you really easily if you put sampling into things. So that's another thing to think about. OK. Yeah. Yeah. Good observation, yeah. That's right. So yeah, that's worth pausing and talking about a bit more detail. So I had some slides on that that I was thinking of including, and maybe it's good just to do that on the board. So let's just think about some general tensor contraction. And let's say we have a funny one like this. Like we have some tensor A with four lines, and we have some other tensor B with three lines. Let's say we're doing that contraction. And let's just call these i, j, k, l, m. And then let's just say that i goes from 1 up to di. j goes from 1 to dj. And so on d just means dimension of i, dimension of j. So then what is this diagram? That diagram is just a sum. And it's just a sum over all the connected indices. That's l and k. Holding i, j, and m open. So it's a i, j, k, l, b, l, m. And so then the way this really works on the computer is that I have to address i and j. So I have to sort of fix those. I'll just draw arrows just to indicate that I make something up. Oh yeah, b also has a k index. Thank you. Yes. So thank you. So the point is I have to hold i, and j, and m fixed. And kind of think of that as some slice out of a and some slice out of b. Now those are fixed temporarily. Then that's just a sum over, that's like some kind of generalized dot product that I have to do over k and l if you want. So that dot product costs dimension of k times dimension of l. So each of these inner computations cost dk times dl to do. But then I have to hold, how many different ways can I hold these other indices fixed? And it's d i, d j, d m. So these are like, how many different ways I can kind of clamp these and hold them fixed? And for each different way of doing that, I have to do this sum. So then you just multiply all the dimensions. That's the idea. So that's the important thing is that you might want to count somehow the ones that are contracted over twice or something. You don't. You just look at the whole diagram once it's connected up and then it counts everything once that's participating. So that's an important thing to know about. So in fact, I mean, don't feel dumb if you didn't know that or if this is new information. Because there was sort of a very funny time. Maybe just old enough to remember when this topic was sort of taking off. And some of the people who were coming from quantum information and working on this topic were like kind of boldly writing down these algorithms for 2D and just not even really paying attention to the costs of these different steps. And they were saying, do this, and then do that, and do that, and do that, and do that. But then people who actually had to program it would try it and they would get an algorithm that would scale some bond dimension to the 20th power or something. But then the people who were doing this were saying, yeah, but it's not exponential. Because if you read these papers about complexity theory, it'll say, anytime you go from exponential to polynomial, it's efficient or something. But I mean, something to the 20th power, I don't know. That's not really going to be practical to do. So now all people are wiser, and they know you have to think really carefully about these scaleings of each step. So let's just recap those last two slides. So the two key operations were this one, m cubed d, this one, m cubed d. So the whole thing is m cubed d. And that's great. Because if you think about it, underlying a matrix product state, matrices. At every site, there's two matrices if it's the spin half case. And matrix operations, you may or may not know, they generally scale as the cube of the linear size of the matrix. So what this means is that we're doing things with high dimensional tensors of wave functions that involve these, we're storing these matrices. And we're doing everything with the scaling that's the most efficient scaling you can do for matrices. So this is a good sign when you see this. So this is really efficient. I mean, m cubed, you know, would be better if it was m squared or just m. But m cubed, you can push very hard. So you can take m to be many thousands and afford that. So that's very favorable scaling. OK, so that's the good news there. And so the rule of thumb, in fact, with MPS is that you always want to see if you can get m cubed, where m is this typical bond dimension of the MPS. If you can do anything with m cubed, you're in business. m squared is even better. And there's a few steps you can do with m squared. But if you do anything that's m to the 4, maybe, but you should worry about it. And if it's higher than that, you're doing something wrong basically, or you're not really getting the benefit of a matrix product states that you should be. So that's just kind of a word to the wise, or if you're going to get into this business seriously. m cubed for matrix product states. Now for PEPs and these other more fancy tensor networks, you have to settle for less. So there you'll have bond dimension to like the 6 or the 7 sometimes. And that's just life right now with those tensor networks. And so people are trying other strategies to make them more practical to work with. Monte Carlo is one people are thinking about, actually. OK, so that was just sort of a few slides about some technical aspects of if I have a matrix product state, what are some things I can do that I couldn't do otherwise. So I can actually extract local properties, like expectation value of operators efficiently, normalize them efficiently. These are non-trivial things. You can't really necessarily do that with other forms of a wave function so easily. I mean, you can with slater determinants and things, but this is really important. Because of that capability, that's given us lots of nice ways to optimize matrix product states and other tensor networks, so I'll talk a little bit about that. So let's talk about applying them to physics. So this is just kind of a lightning introduction to DMRG, which is how do you actually obtain matrix product states, say for ground states of many body systems, even excited states? So this may be a little too fast, but hopefully it will give you the flavor. And then I'm going to point you to some resources that have more in-depth explanations. Also later this week, I forgot to mention, but later this week, Uli Sholwak, who's a real expert on DMRG, is going to be giving you more detailed lectures about that and other topics, so look out for that. But so let's start with it at this level. So let's think about the many body problem as this kind of tensor optimization problem. And you can certainly think of it this way. So let's say we have a many body Hamiltonian, and I'm just writing it right now as this giant object with two N indices. So I have N like bra indices and N ket indices in the top. I mean, I'm in my head. I'm thinking it was in bra and ket indices. So basically, it's an operator, so it has to have two copies of the physical indices. And then the wave function has one copy of the physical indices. There's the wave function, the ket, and there's the bra version of the wave function. So that's what the energy looks like if the wave function is a matrix product state. Any questions about that diagram? Like what I mean there? That's just putting the wave function on both sides of the Hamiltonian. But I haven't said anything about the Hamiltonian yet, whether it's local or what. And then what we want to do is we want to try to update the numbers that live in each of these blue tensors, these wave function tensors, same on top and bottom. It's the same wave function twice, until we can get this energy to go down for a normalized wave function. So somehow we want to just update these. So one thing we could do actually is just compute a gradient and just gradient correct those. But there's something smarter that you can do actually. So one thing you do that makes life a little easier is you say, OK, instead of trying to update all the numbers in these wave function tensors all at once, all at the same time, what you can do, and that can be a good idea in some contexts. But what tends to work better for these is to freeze most of them for a while and just vary a few. And this is one of the first ingredients in the so-called DMRG algorithm. So what you do is you say, OK, on this step let's vary the third one. Maybe in a later step we'll vary the fourth one and the fifth and then we'll go back and forth. But for now let's say let's vary that one and hold the other ones in blue fixed for now. So then in that form you can think of sticking all the other ones that are frozen onto the Hamiltonian and they transform the Hamiltonian into a reduced or effective Hamiltonian just for this red one. So you can think of this as a little mini Hamiltonian for the red one. I'm not sure if I have a drawing of it. Let's see. Let me draw more of that on the board. Just to be more concrete about it. So if you look at that thing in the middle, it has three indices sticking out of the bottom, one, two, three. And it has three sticking out of the top. So you can write that thing in the middle as some kind of three leg thing like that. And then this tensor that we're varying comes on like that and then the result would be another tensor with three legs. So you see how that all fits together. So that's like h times this little tensor here equals another tensor that's similar to it. So that's how you'd multiply h or you could take this stick it on top and then you would get the energy but just involving that tensor in red. So that's what I mean by this is kind of an effective Hamiltonian for it. Now I'm leaving out some important technicalities which is that for this to really be properly thought of as an effective Hamiltonian, these transformations in blue, it's best if they're unitary transformations. But if I just put random numbers into this MPS, they won't be. So it's best if you arrange for them to be unitary and it turns out you can do that without loss of generality and that's a whole lecture in itself. But it turns out there's always a way to make these blue things unitary individually and collectively. And so this effective Hamiltonian really has properties that are very much reflective of the original one. So that's an important detail. And also it makes it a regular eigenvalue problem rather than a generalized eigenvalue problem. So now we've reduced it to just an eigenvalue problem for this red guy. And then we also have to use the fact that H itself has some structure. Otherwise it wouldn't be efficient to do this. So we have to remember that generally we're talking about Hamiltonians that are sums of local operators. I mean there's more complicated ones that might have power law interactions and things like that. We can deal with those as well. But the simplest case is let's say it's 1D and you just have a nearest neighbor lattice model. So then the Hamiltonian is just the sums of these local terms. And these lines are identity operators. Someone asked earlier about how do you notate those? So these are just tensor products of identity operators. That's one thing also that's interesting about doing numerics and working with these kinds of algorithms is that when you do theory on paper often you kind of gloss over these details of like you write s1 times s2. And you know what you mean by that. But really that shorthand of course for s1 times s2 tensor product identity 3, identity 4, all the way to identity n. Sometimes you forget those things when you're doing pencil and paper theory. But on the computer you actually have to tell the computer that there's an identity on every other site. Computers don't know these kind of things implicitly. OK, so then this is the most complicated slide, but just to give you a flavor. So let's take that form and then imagine sticking that inside here, kind of wrapping it up in all the parts of the MPS that we're not going to optimize on this step. And then see what happens when the kind of the dust settles. This is mostly just a motivating slide just to give you a flavor. So what I've done here is I've exploded the Hamiltonian out into the first term, the second term, the third term, the fourth term, the fifth term on a finite small lattice. And then for every term I've wrapped the MPS around it on top and bottom, leaving out the third site. So any questions just about what I'm showing here at all? So these are all the terms that are inside this little mini Hamiltonian. Then what you start to see is that some of these terms have things in common. The right-hand side of this one is the same as the right-hand side of that one. And the left-hand side of these three is all the same. Even some little mini pieces are the same. If I contract those two tensors together, that's the same as contracting these two together. So I don't have to calculate all these separately. I can do smart things to try to make this more efficient. So what that looks like is I say, OK, I'm going to calculate that thing once on the right and call this tensor that has these two lines coming out of it. And I'm going to bring all those things on the left end, and then I'm going to do further reductions. You can do these kinds of left-right reductions where basically this thing in blue indicates that there's no operators on the right. That's just the one-time identity being stuck into the right basis. And over here, this is the identity being stuck into the left basis. This is some right piece that involves having a term somewhere on the right completely and terms on the left completely. And then finally I have these complicated pieces where I have a term that's actually touching the site that I'm optimizing. So this is all the terms on the right in the Hamiltonian, all summed together. All the terms on the left all summed together. And then these middle terms that have to do with how many actual terms cross that site. Don't worry if you're not catching all these details. This is just to show you where the technical parts live in at one of these calculations. But then I can actually do optimization in this form. So I do that work once. And then I have this little mini Hamiltonian. Now I can use efficient algorithms like Linkshows or Davidson that are basically state-of-the-art ED, exact diagonalization algorithms, to optimize that tensor. That's the upshot here. That's the payoff. So instead of having to resort to something slow, like gradient descent, I can use all the speed and power of Linkshows or Davidson. And the key step with those algorithms is you have to multiply whatever wave function you're optimizing by the Hamiltonian. So you multiply it by the Hamiltonian, then you do some transformations to make an orthonormal basis. Then you hit it again with the Hamiltonian over and over and over, maybe three or four times if you're doing a partial improvement. And so this is the key thing you need to be able to do is stick the wave function in. That's putting this red tensor into these four diagrams. Then the result is a three index thing, each one of them, and you add them back up, and you loop this a few times, and you get this really fast convergence of the ground state from that step. There is. So it just gets into lots of details, but you mean this thing of zooming in on one side at a time? Or? Oh no, so this decomposition into these gray two site terms, here that was an assumption. So it's not that I'm using that, it's that I'm just saying if the Hamiltonian has that form. Like if it's the Heisenberg chain or something. If it's not, you have to do more complicated things. Yes, like an MPO is one of those things, yeah. So you can do an MPO, you don't have to. But for some things, MPO is really you have to, or it's like the best way, like if you have long range interactions or something. So there's definitely a whole more sophisticated set of things you can do here to make these Hamiltonians. But let me just say that these slides were just to give you a flavor of why I'm only kind of going quickly through this part, because it's really this big lecture how to optimize matrix product states. And there's many things that could be said, but basically hopefully that gives you a flavor, is that you take the ability that I mentioned in the first few slides of being able to efficiently compute local terms, and you kind of operationalize that into getting these reduced forms of your whole mini-body Hamiltonian down to just one or two tensors of your MPS. And then you have all the pieces you need to use in the core, Langcheis or Davidson, these very fast algorithms that can really take you big decreases in energy, that can really improve your way function a lot. Then when you're done, you switch to the next tensor and optimize that the same way, and so on, and so on, and so on. So you can get really good results this way. It's much more efficient, again, than say just treating all the parameters in your MPS as just some collection of adjustable parameters and doing gradient descent on them or something like that would be very slow. So this can be much, much faster than that. That's one of the key points. Yeah. Which one, like this right-hand side here? Well, that has two, and then the bottom part is one of the three at the bottom. There's one, two, three on the bottom, and one, two, three on the top. Well, in this one, I mean, there's a lot of details missing here, of course, but here this gray one is meant to indicate all the terms in the Hamiltonian on the left all wrapped up in that part of the MPS. And then the middle one is just an identity operator on the site three in the example I was doing, and then the right one is some kind of rotated form of the identity on the right, in this case. To be specific about what I meant here. So basically, you have the terms where they're all far away on the left, then they finally start crossing the bond, then they're over in the right. That's what's left over here. So the point is that it's always some small number of terms that you can reduce it down to, and you can be smart about how you do it, where now, as I move over to the next site, I can save a lot of these pieces from before, and I don't have to recalculate everything again. I just advance them a little bit. I had these things that were every term up to site three, and then I just include the term on site four, and now I have every term up to site four. So I can reuse that piece and just kind of keep growing it as I go back and forth, if you're smart about it. To explain this in full is a whole lecture, but I just wanted to touch on that. But hopefully you see how this is a valid form of that thing, of that effective Hamiltonian, because you can see it has three indices in the bottom that I'm able to stick that red tensor in, and then what's left over is three indices in the top. So I get a new red tensor like that, and I can iterate this. So anyway, it's probably, I was debating whether I should even get into these details, because it's sometimes worse to show it halfway than the whole thing, but anyway, it's mostly to give you a flavor of, there's some non-trivial steps in doing DMRG, but I just wanted to give you some flavor of how that algorithm works. So we have this algorithm that lets us update a matrix product state tensor one tensor at a time, or two tensors at a time. And if you do it two at a time, you can adapt. So you can grow the bond dimension automatically and let it grow or shrink as needed. You can get really accurate results. So I just wanted to now spend a few slides on saying, OK, there's DMRG, and there's nice packages you can use to do it. I'll just flash one of them, which is the one I work on in a little bit. So let's say we have this algorithm in hand in some form. You can use one of these packages I'll mention later. What kind of results can you get, and what kind of systems can you apply it to? So as far back as 1993, which is around when DMRG was invented, you could already get some very impressive results. So this was a study done by Steve White, who invented it, with David Hughes of the spin one Heisenberg chain. And this was at a time when I think, maybe I'm wrong about this, but I believe there was still some debate even about whether the gap, whether there was a finite gap or not, or at least what exact size it had. This is this famous halding gap, which is predicted to be there from some kind of topological effects. So while this was a debate was happening, Incomes This Method gets the ground state energy to however many digits that is, like nine digits or something, and then gets the gap to five digits. So this method was pretty impressive already from day one. And this is back in 1993. So you can imagine you can do a lot more, even more recently. So now one technology we have is you can run DMRG in parallel. This is just showing an animation of that. So this is some work I did to parallelize DMRG, where basically you have in this case four different walkers, and they've been assigned a patch of the system, and they each walk over their patch locally improving the matrix product state. And then the blue dots here are the local energy. So that's just the value of S dot S, expectation value of S dot S in each bond. And so you see there's some boundary effects. It's an open system. But then in the middle it gets very flat once you're past the correlation length, past the edge of the system. So and then a key point about it, one reason I just wanted to mention this on this slide is that convergence for a gap system can be very, very fast. So toward the end of a calculation you can basically get exponentially fast convergence, where in this case I only did four sweeps. And you can see the energy changes a lot at the beginning, but then toward the end you're just sort of putting on extra digits onto the energy. So it gets very, very fast at the end. It's kind of hard to expect that from most algorithms. So it's kind of a special algorithm that way in terms of just how fast it converges in the best cases. So you can see now it's just kind of cleaning up in these smaller digits. Yes. So this algorithm needs to be studied more along the lines you're saying, actually. So one problem is it was an interesting idea, but it's been kind of missing like a key use case. Because basically the serial version works too well, kind of. And so all the 1D problems had already been pretty much cleaned up. And then 2D has problems that this can't fix. So 2D has problems to do with scaling that are just fundamental that you can't beat, but just by paralyzing very well. But now there's a good case that I want to try this more, which is quantum chemistry, which I won't say a whole lot about. I mean, I think I have a slide coming up about it briefly. But there you have just a huge amount of sites you have to chew through in some of the ways of doing it. If you want to have very good resolution of real space, like you've used a lot of basis functions. So then it'd be good to try this algorithm again for that. And indeed, it's a good point you're raising. It'd be very interesting to run this algorithm in the limit of every worker gets just one or two of the tensors. And it doesn't even sweep. It just stays there and just cranks away on that one or two tensors. And then you just have as many walkers as you have sites, basically. And they're all just cranking away and sharing information with each other. I don't know actually how well it scales up. So definitely it works better for two than for four. But then after that, it shows pretty good scaling. But I don't know if that goes all the way to end. It's an interesting question. So yeah, maybe it doesn't. But maybe it can be made to. Yeah. So that was some discussion about just applying it to 1D systems briefly. It's been used quite successfully, I would say, in the last 10 years for 2D systems. So even though DMRG in matrix product states are really best suited for 1D, you can just kind of shoehorn them into 2D by these kind of snaking path constructions. The best path to use really is not so much a snake as much as like a zigzag. So you kind of go up the first column, then you jump to the bottom of the second column and go up, up, up, up, up, like that. This is from a paper that's a really nice paper talking about actually transforming the basis in one direction. Someone asked earlier about momentum space. In 2D, that can be a good idea to use momentum space in one direction and continue to be in real space in the other direction. And this is actually in 2D one of the state-of-the-art methods. So even though it's not really that well suited for 2D, in principle, in practice, everything is having such a hard time with 2D that this can be competitive, even with like quantum Monte Carlo, even without a sign problem in some cases, if you're clever about how you use it. When you do have a sign problem, it's one of the only methods you can even try. So there's that. So it can be very powerful. So this is just showing how powerful it can be. So this was a nice study from 2007 using this kind of snaking path construction on the square lattice Heisenberg model. And also they studied triangular lattice Heisenberg model and some other models in the same paper. And you can see that there had been some prior QMC, quantum Monte Carlo work, trying to estimate the magnetization and getting it very precisely, and had gotten the bounds down to these two dashed lines. But by a very clever combination of 2DD-MRG calculations, combined with very clever thinking about finite size scaling and aspect ratios of different rectangular clusters, these two authors were able to extrapolate very precisely and get much better kind of error bars than the prior QMC work. I mean, since then, the QMC has been improved. But I mean, it wasn't obvious that this would even work this well. So it's an interesting paper. What they actually found was that by doing different rectangles of different aspect ratios, you could actually kind of dial in the finite size effects and make them go away. So you could basically have a very flat extrapolation. Like this red one is kind of the optimal aspect ratio. So you could get an extrapolation that's basically just straight into the thermodynamic limit. So that was pretty interesting. And then you can study all kinds of systems with DMRG. You don't have to just study spin models. You can study things like electron systems, including quantum Hall systems. So this actually isn't from a DMRG calculation, but it's more illustrative of what you can get. But this is by a group that also later used DMRG to study these same systems. This is actually an exact construction. It's pretty interesting. This paper tells you how you can take all kinds of interesting exact kind of quantum Hall states, like Laughlin states and other states, and basically kind of write down, more or less, by hand different matrix product states, forms of them in 2D. And then you can actually make quasi-hole excitations and then kind of map them out. So these are some densities that were measured from these kind of exactly constructed quasi-hole states. But underneath this, there's a matrix product state snaking through. And what you do is you basically work in a basis of orbitals, continuum orbitals that go around a cylinder. You can see this as periodic if you kind of look, see that connects to that. And then you can do very interesting constructions of quantum Hall states on cylinders. So that's just another area of application that probably there's more to do there. And then one other area of application I think is the last one I have is quantum chemistry. And this is an ongoing story. It's been very successful for quantum chemistry. And there's still a lot more that could be done that needs to be done. Just to show you the kind of results you can get applying DMRG matrix product states in this context, this is showing the local density of states. I should have an x-axis here. As you kind of scan a different energy, I think the idea here is that you basically imagine shooting in a photon at different energies and then asking, could I eject an electron out, basically? So you can resolve different kind of core states this way. And then this is a table of core state values for a particular core state where you're basically saying, what's the energy of a certain tightly bound electron of an oxygen atom of a water molecule? How much energy would I have to supply to remove that electron? And you can see that doing DMRG in different basis sets, you can kind of converge to really high quality basis sets. Basically, the more letters it has, the more functions it has, and you're resolving the continuum more and more and more and more. That's what these crazy combinations of letters means. It just means you're basically stacking more and more Gaussian functions on top of atoms to sort of resolve the continuum. And then you can see that many of these compare very favorably to the experimental value. And some might actually be more accurate than the experiment, in fact. So you can really compare to experiment with this method. So in terms of things I won't have time today to say, and today and tomorrow, I could say a lot more about optimizing tensor networks. I could say a lot more about other algorithms and types of tensor networks. Let me just point you to some resources, at least one new resource that I'm working on. So this is a review article that I'm working on. But it's interesting because it's not a paper. I'm not planning to publish it. It's just a website, basically. But it's not even really a website. It's really a GitHub repo of a bunch of text files that I'm working on that you can also help me work on, which you can compile into a website or to a PDF or whatever. So let me just give you a quick tour of this. So I'm calling this website the tensor network. And it's just meant to be like a community resource where I just going to write down every known tensor network algorithm, every known fact that there is about those. So let me just show you how that looks. So here's one of the pages on TRG, which is an algorithm I'm not going to go way into. But this is an algorithm for contracting the partition function of a 2D classical model. So you take something like the classical Ising model and a 2D lattice at finite temperature. And you can write that as a tensor network. And then you can use this very interesting set of steps where you can basically break the tensors into two pieces and then merge them back together in some complicated way and in coarse grain to a new lattice. And you can iterate this over and over and over again. And so this actually goes through every step of that algorithm. Later I'm planning to add some code and some things. And it has references for you. Here's a page all about matrix product states. So I'm trying to give a nod to the implied math community where people in the applied math community have started calling these tensor trains just a different name for matrix product states. So I'm trying to mention that as well. And so it talks about what is one of these things? What are the key concepts, number of parameters? Then it goes through these different algorithms like how do I get a single component of an MPS? How do I do an efficient inner product of two MPSs? That's the thing I showed in the earlier slides, this kind of going from the side algorithm. Here's an interesting one. How do I take an MPS of one size of bond dimension capital M and compress it into a matrix product state of a smaller size, making a controlled error? So what if I somehow did some optimization that blew up the size of my MPS, and I want to shrink it back down, but in an optimal way? There's a really interesting algorithm for doing this. And what you do is you actually compute reduced density matrices. Then you diagonalize them one at a time going backwards, sticking the result on, going from right to left. And then in the end, once you've done all of them, then the result is your new MPS. So there's this really interesting algorithm to do that. So just flashing that up there. Lots of references to review articles and original articles proposing all these ideas in the first place. And then one other page I'll show is there's the entire DMRG algorithm actually written out in steps for you. So it just talks about some background. And then it basically shows you preparing your MPS, building up the Hamiltonian, the reduced Hamiltonian, then doing optimization of the first bond, breaking it apart, going to the next bond, advancing the Hamiltonian. It has all the steps and everything. So I'm planning to flash this out a lot more. Maybe you've ruined some sample code. One other site I wanted to mention that, so at the same time, Glenn Evendley had more or less the same idea, but his is more pedagogical. So he came up with a site called tensors.net. And this one is really nice. So this one is aimed more at people who are in physics who are starting on this path of learning about these topics, and they want just to get started and kind of focus on physics applications. I asked Glenn if he wanted to include everything. And he's like, no, I just want to keep it aimed at physics grad students, basically. So this one is basically just some very high level information about tensor networks. But then it has things like DMRG completely written out in Julia, for example, or MATLAB, every step. And you can just copy the code from the website. So here's some similar diagrams. But then it just has code you can just copy yourself and then run. And it's not using really any other libraries more or less other than some standard Julia libraries. So that's really nice. So you could write your own DMRG code starting with Glenn's Julia one. And it's probably already very efficient. And you could probably already use it to do research. So that's another resource that I'd recommend. And then one other, let me just quickly go back to the tensor network page for one second. One thing I wanted to mention was that, so like I mentioned, this is an open source review article. So what I mean is if you click here on source, this takes you to a GitHub page where you can actually go into the source folder. And you can see every page. So like this one diagrams. And if you look at the raw text, that's this page. So you can see it says tensor diagram notation. Tensor diagram notation is a simple yet powerful graphical notation, et cetera. That's coming from this file. So all I did to make that page is I just wrote this text file. And then I have a little static site generator that just generates the website from that. So it's really easy to add content. So if you or someone you know knows anything about this topic, or if you see an error or anything you want to add, even a reference to one of your papers or something, just you can just edit some simple text files like this. These are markdown files. And then send me a pull request. I'll add it. And then you'll be credited because on GitHub it tracks all the contributions basically from everybody. So the idea is I'm trying to make it so everyone has some kind of academic incentive to actually help out. So it's like this ongoing review article that I'm going to try to write with the community. One other resource I want to mention is iTensor, which is this whole software package that I maintain and develop and everything. And iTensor does a lot of things. It has all these facilities for doing low level Tensor operations. But it also includes an entire DMRG front end. So this is an entire code to do DMRG right here on the front page. Basically, you can input all kinds of complicated lattice models almost in a form that's similar to how you write it on paper. You just say s plus j, s minus j plus 1, et cetera, szj, szj plus 1. And there's a little system that will compile this into what's called an MPO, which is a type of Hamiltonian that DMRG understands. It's like a Tensor network for a Hamiltonian. And then you just set up a wave function. You say how many passes or sweeps you want it to do, which is how many times it goes left and right back across the system. And then you just say run DMRG on it. And then when you're done, psi, this object psi here will be an optimized matrix product state that you can then use these steps that I mentioned at the beginning of the talk to, say, extract local properties efficiently. You just stick tensors together and you can, say, measure the magnetization or other correlation functions, things like that. So there's a whole system for doing that. A lot of it's already made for you. OK, so that was just some resources I wanted to quickly mention. So check those out. So there's tensornetwork, itensor, tensors.net. So lots of stuff out there you can read about. So any more questions about tensornetworks? Because I'm going to shift gears now. Let's see, OK. I'll have to skip some stuff in this part, actually, because I didn't quite time this right. OK, so just to get ready now for the lectures for tomorrow, and maybe I can spill some of these into tomorrow if we don't get that far. I just wanted to give some kind of really broad introduction to machine learning, so we'd all be ready to talk about that tomorrow. Plus I thought it would be just generally useful for all of you. So let me just get an idea. Who's heard of supervised learning? Who knows that term? Who's unsupervised learning? Kind of same people, right? Who's actually programmed neural nets, or a linear classifier, or something like that? OK, not so many. So I think this will be helpful to you guys. All right. So as you all know, machine learning has really taken off in recent years in terms of what it can do, probably taken off too much, probably some hype as well. But we've seen the results, right? So progress in natural language, actual demos of self-driving cars. The real impact will probably be more in quiet ways, like in our lives. We'll probably see quality of medical diagnoses just start to improve a lot. There's companies working very hard on that. One of my friends actually who, he was a postdoc at Rutgers at Santa Barbara doing DMFT. I remember meeting him here at one of these schools. Now he works, this is this guy Chuck Yee. He works for a company now in New York City that does machine learning of medical data, actually. So this is something that if you're looking for a career using all of your physics numerical optimization skills, this is a good career to get into actually. So he's making six figures living in New York, now doing that. And then for us directly in science, machine learning could have a big impact on studies of materials, chemistry, even strongly correlated electrons. The hype is getting so big that Google's basically rebranded itself as a machine learning company now instead of a search company. And so they have these articles now coming out saying stuff like why the people who now sit closest to the CEO is actually the Google Brain AI team. Not the search team, you know. And they replaced their Google translate efforts just with a neural network system actually. And they have a team in LA, mostly consisting of physicists working on a quantum computer and they're very interested in quantum machine learning, which I'll talk about tomorrow a bit. So that's just to show you how much is going on. And machine learning is already being shown to be very useful for the kind of physics that most of us in this room are interested in. So things like identifying phases of matter. So this is a paper from 2016 with Juan Carasquilla, Roger Mocco, where they trained a multi-layer neural net to figure out from data what the order parameters should be and then report what phase you're in, right? Also, this is an interesting set of papers by the group of Pankaj Mehta and Anatolia Pokolnikov. And this is mostly led by this guy, Marin Bukov, who's now a post-doc at Berkeley, I think, and then Alexander Day and Dree Cells. So they have this nice work where they are basically saying can we approach quantum control theory from a machine learning perspective and think of it as what's called a reinforcement learning problem. So I encourage you to take a look at their work. And it's getting used in other areas, too. So things like materials discovery, also high-energy physics. This guy, Carl Cranmer, at NYU is doing a lot of interesting work. So they're actually trying to constrain theories or extract maybe discover new particles by sifting through huge amounts of data using machine learning. So that's some interesting work. Let me fly through this part because some of you have probably seen this before, but let me mention a few of these. So what are some examples of why machine learning is why people are saying it's a big success right now? So a key example was this one. It's called ImageNet. And ImageNet was this very challenging benchmark that it consists of 1.2 million training images and 150,000 test images. So you're given this huge dump of all these images, and you're given 1,000 categories. And your job is to sift these images out into the correct categories, just by being given these labeled training images. So this had been going on for some years. And then this paper came out in 2012. And there was a competition that year. And that year, this paper got only a 15% error on the test set. So it trained on the training set. Then they tried the same model on the test set. And it only got 15% of the labels wrong, which is a lot. But I mean, it's pretty impressive for how complicated the task is. The next best entry that year got 26% wrong. And years prior to that, things have been creeping along. It had been 28%, 27%, 26%. Then boom, this thing jumps all the way to 15%. And now they've gotten it up to, I think, only a few percent, if that. And you can see how hard this task is. You're given pictures like this, which is like a Dalmatian sitting behind a bowl of cherries. And the correct label is cherry. But that's really hard. But then there's ones that are less tricky, like this one. But still, you have to have a system that can recognize that's a container ship. And it says, yes, container ship. So this is a really tough task. Let me skip this one. It's cool, but running out of time. There's other things you can do. You can actually use machine learning to generate things. So that's called discriminative learning, that image net thing, where you look at things and you discriminate, you say, what label should this have? You can also generate. This is using this technology called GANs. More generally, this is just using deep convolutional neural networks. And what you can do is you can create neural networks that map from some high-dimensional space of images to a lower-dimensional latent space, hidden space. And then you can do arithmetic in this latent space. So you can figure out, in this smaller hidden space, you can kind of think of this as like a space after you've done RG or something. You can find that there's a vector that's like the typical vector of all images of men wearing glasses, all images of men not wearing glasses, and then all images of women not wearing glasses. Then you can just do this arithmetic of say, OK, if I take the man with glasses vector, subtract the man vector, then in my head, I'm thinking, OK, maybe I found the glasses vector. And if I add the glasses vector to the women vector, maybe I'll get women with glasses, and that's actually exactly what you get. And it even handles cases like when the woman turns her face a little bit to the side or not, the glasses stay in the correct position on her face. It doesn't always work perfectly. Sometimes the glasses are a bit see-through or something. But that one, it kind of looks like she has heavy makeup or something. But the other ones, it works. So the fact that this all works tells you something very interesting might be going on. There's some kind of hidden structure in some high-dimensional space. And you could do arithmetic on it. So that's very interesting. And basically, you can succeed at tasks that people just thought were like, computers may never do, like be able to beat humans that go. But now it's just destroying humans that go. So all this has to do with somehow that people are getting a handle on some kind of high-dimensional spaces. And we're able to kind of think about how to program computers in some different way. So what is machine learning? Can we try to summarize what's going on? And so here's my best attempt to summarize what's the core idea behind all this. It's basically data-driven problem solving. So the idea is that we want to say that's any system that if you just feed more data into it, just gets better and better and better at some task. So we want to automate that process. It's kind of like getting the human out of the loop. So the idea is that the human helps to feed the data in. But then the system takes the data and knows what to do with it to solve the task. And the human doesn't have to really understand what the system did in detail. And so it's really just some general framework or philosophy, there's lots of different methods. So I'll touch on what some of the different methods are. Another bit of philosophy that I like is this article by this guy who's now the director of AI at Tesla. So he says that it's almost like a new evolution in software. And I think this is actually pretty accurate. That if you think about originally software, I mean go one step even further back. What was software 0.1? Software, earliest version of software was you had computers and they could do, they could add numbers together. And so you would say, well, what number should I add to make the computer do a certain thing that I want? But then people said, wait, we can turn this into concepts like if and then and else. And we can have a raise and we can have languages like C that take these manipulations of binary numbers and turn them into things humans can think about like logic, right? But this is kind of like one step past that. This is like saying, okay, we went from just manipulating numbers on chips to logical operations, which is what's up here. Now we can even think about it as we just say, hang on, we don't even need to be telling the computer each step it should be doing. We just describe a generalized collection of steps like multiply things by numbers, take those numbers and plug them into some functions and so on and so on and so on. Then what we do is we tell the computer how to program itself. So we say, okay, here is a way to take a program and replace it by a slightly better version of that same program. Then the computer does all the rest. So basically we just tell it how to computer gradient and then the computer just computes that over and over again and it programs itself. So the idea is you actually get the humans out of the loop even out of programming. Now the computer programs itself and all we do is we just describe families of programs and their gradients or something like that. So it's kind of like this higher order of programming. The good thing for you guys is that this kind of programming uses a lot of math skills, calculus and complicated optimization. It's a lot like varying wave functions. So this is a possible job opportunity for all of you. Okay, so let's just go through some basic concepts of machine learning. So I'll fly through this part. One basic concept is a data set. Okay, very simple. You're given a bunch of data and you're given categories or labels and very often the data set is divided for you or you can do it yourself into a training set and a test set. And the idea is this is what you can do whatever you want with. This you're only supposed to look at in principle once at the very end of your study. Like if it was a double blind study you would never see this and some other group would be holding it on for you or maybe the referees would have this and they would apply it to your model. And the more I think in the real world you kind of sometimes snoop onto this and look at it when you're not supposed to. So what you do to prevent that snooping where you're kind of secretly bringing the test set into the loop is you make these things called validation sets and you say that's gonna be my mini test set within my training set. So you train on the blue stuff then you test on that and you can do that as many times as you want and that way you don't have to have a test set around. Then what you can do is what's called cross validation where you say train and test on this one then you train on these and test on that and so on and so on. And then the idea is you by having different validation sets you can get an idea of like how well is this likely to really work in the real world where you don't get to see all the data. So it's a good idea to have multiple validation sets. Okay so now basically machine learning you can think of dividing at a very high level into different kinds of tasks, okay. And these broad classes of tasks are these. Supervised learning, unsupervised learning and reinforcement learning. And you can think of these as corresponding to how much knowledge you have about your data a priori. So by a priori I just mean upfront. So in supervised learning you have a lot of information. You have your data in hand and you have labels telling you this is a cat, this is a dog, this is a, you know, container ship or whatever those things were. Unsupervised learning you maybe only just have some data but you don't even know what it's about. You're just like here's a lot of configurations of some physics model. I don't even know if it's a ordered phase or what is, I don't even know if the model is one D or two D, you know. So you just have a lot of data. Reinforcement learning is the most extreme case where you don't even have data at the beginning. You just have a world and you have a way of collecting data. So you're just starting out, you go into some dark room and you have a flashlight and you're just like scanning it around trying to get information about the room and then make decisions about how to proceed through that room. So it's like one step back even from unsupervised learning. So, and then I'm gonna have to end pretty soon so I'll wrap the rest of this up tomorrow but let me, I think I have time to go through these tasks and then we'll kick the rest tomorrow. Okay, so let's go through these tasks and describe what they are all about. So the one that's gonna be the most important because it's the one where you can, it's the one where all the technology is making the biggest progress right now and I'll talk about it the most. The supervised learning. So supervised learning, what your task is is you're looking for a function F called the decision function and its job is to discriminate between different inputs. So let's say you have two kinds of inputs in this little mini universe of this problem and those inputs are type A and type B. So the type A could be pictures of alligators and type B could be pictures of bears. So maybe it's telling you like, what should you do if you encounter this versus that? Should you run away or should you play dead or what's your strategy for surviving this encounter? So what you do is you have different kinds of ways of signaling the output. The simplest one is just if the output's positive it indicates the function thinks it's an A, negative it thinks it's a B. It could also be some kind of vector valued output if you want if you wanna have more than two categories. And the way you do, the way you often find this function is you say, pick a function that you can parameterize that has some adjustable parameters then stick it into some cost function, some kind of objective, which is just that you test your function on every training input that you have and you measure the distance from the output of F to the ideal output, which you call Y. So Y is like the label. So it's either plus one for A or minus one for B, in this case. And you say how good is the function and however good it is that determines this number C. Then you just kind of optimize F, F has some adjustable parameters. You just keep adjusting them to try to get this to go down, however you do that, okay? So it measures the distance of the trial function from the idealized function YJ, which is giving the perfect outputs for the inputs. So it's just like think about the function as some very high dimensional object. This is why it's like the perfect function, F is the one you have and you're just trying to bring them closer together. So it's sort of just high dimensional fitting in that point of view. There's a part of it though that goes beyond fitting and that has to do with generalization. Unsupervised learning could be many things. It's sort of anything you can do with unlabeled data. So one thing you could do is you could say the data maybe has repetitions or some things in the data are similar to other things. So there's some kind of probability distribution. So I want my function to sort of match whatever hypothesized probability distribution there is. Or it could be that square matches the probability distribution. You could also find clusters in the data. So this third one is rather different from the first two. You could say the data has some kind of structure and maybe I want to find that structure. Maybe I want to lump different things together and that's what your task is. Or you could even just be doing preprocessing saying let me take high dimensional data and find some reduced or compressed form of it that I could then use to do other things like either some of these tasks or supervised learning from the previous slide and so on. And the approach here, let's say for this first task, this probability one, is that you say okay, let me call the function I'm trying to optimize P. So there's the P I'm optimizing and then there's like the true one of the data which you think about is like taking your data and just putting a delta function of probability on everything you've observed. So that's like your perfect probability distribution in a sense but it's kind of an overfitted one. And then you say okay, I have the one I can actually optimize P here and then I want to measure some distance between this and this. And one way to do that is to use this thing called the Kullback-Libler divergence which equals this formula up here which is called the log likelihood. So the idea is that you basically can say let me take every input that I have and plug it into my P that I'm trying to improve, this function P. Take the log of it so that things are more manageable and then I just want to maximize this number, the sums of these logs which is the equivalent to the products of all the probabilities. So I want to maximize the probability of jointly observing all the data that I have. I want that probability to be high. So that's equivalent to minimizing the distance between your distribution and the true one. So that's what you do there. And the last one I'll mention and then I'll stop for today is reinforcement learning. So this one has, there's a lot of different ways of defining it and different forms of it but here you have some kind of environment. The environment is like a maze or some kind of world that you're exploring. It could be the actual world. Like maybe you actually have a glider that you're throwing up into the air and you want to explore the wind currents in the air. So you have some environment and you have an agent. The agent could be that glider. It could be this little robot in this maze. It could be a game player in some game state. And then you have states, SN and the states are like where are you in the maze or what's your current angle of this glider and what wind is it feeling or something. And then you have actions like tilt the wings of the glider or go up or go left in the maze, something like that. And then you have a reward and the reward could either be distributed all throughout the environment. Like maybe throughout the maze there's pieces of cheese or there's traps or something. So you could have positive rewards and negative rewards or it could be that the reward is only at the end of the maze and you get no information at all until you actually get out of the maze. And that those are very challenging but you can actually make progress on even on those kinds of problems. So then your goal here is to determine a policy and those usually are thought of as probabilistic saying like if I'm in state SN then I want some little miniature probability distribution over actions. Basically I want some probability of going north versus west versus south versus east so that if I follow this policy then I'll try to maximize the reward in the fewest number of steps. And so you can actually do things like you can watch a game of Pong or you can play Pong only observing the pixel states and only knowing that you wanna get these scores to go up but otherwise not knowing anything else about the game not even knowing which paddle is yours or like what the meaning of the ball is or what happens when it hits the boundaries only just watching the score. And you can actually train a little neural net to take in all these pixel configurations and then output a probability of moving up and then you can train this neural net to get better and better and better at outputting that probability to get the score to go up and that actually works. So there's a nice blog I could point you to that tells you how to do that in Python so you can really do this. So that's just quickly going through the three different types. So tomorrow I'll say a bit more about that and then we'll get into using tensor networks to do the first one supervised learning and then also have some slides on doing unsupervised with tensor networks. Okay, thanks.