 I need to have a cough. I already said, but for people who just came in, if you have Jupyter set up locally on your computer, you can download today's notebook at this URL. It should be up on the program page where the other two are shortly if it's not already. I don't know if the secretary has posted it quite yet. It's already there. OK, so it's on the program page as well by the third lecture. And we also have doubled the amount of RAM in the server overnight. So that was why it was crashing yesterday, so hopefully it won't crash this morning. But still, if you can run it locally, it'll run faster for you, and it'll take a little bit of load off the server for everybody else. So today, we're going to be working through this opt 2DMC 2018 ICTP IPayNB notebook. So you can use that from the examples. And having just asked you guys to run locally, let me do that as well. This is not local. Where is my? So has everybody got this up, logged in? Do I need to wait for a minute? I don't hear any complaints. So hopefully it'll work. Today's lecture, we're going to talk about optimizing. Now, it says optimizing in Python, what does optimizing mean? It means making your code work better, except that's not really what it means. It means making your science go faster. So I will mostly focus on general principles of optimization at the beginning of the lecture, and then we'll apply them in a running example to anizing Monte Carlo, a symbolizing Monte Carlo over the course of the lecture. So for this lecture, execute this percent pi lab inline. You guys might remember we used percent pi lab notebook yesterday. This is also importing NumPy and the plotting tools and so on, but it's using the inline backend instead of the notebook one. This one doesn't give you interactive plots when you make them. They'll just be little nice plots in the notebook, but that's a little less load on the server, and we don't really need to interact with the pictures we're generating anyway, so sometimes it's nicer to work with this simpler backend. You'll see that it looks a little different as we go through. OK, so optimizing in Python, but just in general, I have three rules of optimization. I wouldn't quite say it's the three rules of Fight Club because they're not all the same, but they are, I think, really important in exactly this order. So the first rule of optimization is that your development time is worth a lot more than CPU time, especially when your PI is paying for the cluster axis. So don't forget that it's true, even though they might complain about an extra couple hundred hours of CPU hours, they won't. They'd rather you get it done, and the bottleneck is usually you, usually. So the first step is make whatever you need to do work simply and legibly. This is part of why I like Python. I'm going to point at that. I'm not going to point at that. I'm going to point at that. Legibly, if your code is garbage to read, you won't be able to debug it, and you also won't be able to trust it because it'll be very hard to tell if it's doing what you think it's doing once it gets complicated down the line as it develops. So make it work simply and legibly. Write some simple test cases to make sure your code is correct. If you're doing a diagonalization study like we did yesterday, maybe work out separately by hand or in a separate notebook what the spectrum is for the two-spin case or the three-spin case or something like that, and then double check that your fancy code, which is generating the arbitrary and spin case, matches when you apply it to the two or three-spin case. These are simple little checks that you can do as you go, which will save you a lot of debugging time later. So make sure it's correct if you can. If and only if it isn't fast enough to get the physics you need, then start optimizing the code. Otherwise just run it with your simple, beautiful, working, pure Python code. Second rule of optimization. Measure twice, cut once. I think that's actually a rule in carpentry, but it's very applicable. Measure twice, cut once. What does it even mean to measure whether your code is fast enough? It means you benchmark it. It means you profile it. We're going to do some of that in this lecture. That is, you figure out how long it actually takes in real time to do whatever it is you needed to do. And for the case that you want to run down the line, as you're developing, maybe you explore very small test cases. And then you know that eventually you're going to want to run some very big system to get good numbers or good statistics or whatever. Well, you just measure, work out how the thing is scaling with the size that you're attacking, and extrapolate, and then you'll know, oh, this is going to need 100,000 CPU hours. Actually, maybe I need to make it a little faster, something like that. So measure, then cut. So that really means not only do you need a benchmark to figure out how long it's taking and how it's scaling, but you need a profile. You need to figure out which piece of your code is actually the bottleneck. And you should do it by measuring it with the benchmarking tools, because you might do it by thinking, oh, this is the hard part. This has got to be what's slow. And then when you go and measure it, you'll discover, actually, that's really fine. And the bottleneck was somewhere you didn't even realize was hard for the computer. And maybe it was just because you did something in a slightly stupid way, and one line change will give you a factor of n, or something like that, some big factor in how the thing is scaling. That may or may not happen, but the number of times I've been wrong about what the hardest part of something was until I measured it is quite large. So again, if you spend a lot of time optimizing the wrong bottleneck, you're just wasting it. And your time is more important than the CPU. Third rule of optimization, algorithms before constants. So we haven't talked about, in too much detail, what it means. We've mentioned that things can scale as ON or O1. When we're talking about how you can add things to an array, or remove things from an array, what this means, you'll often find that computer scientists, particularly of a theoretical bent, like saying that the runtime of some algorithm is O, I don't know, n cubed log n. Something like that. And this n is the input size. It's the size of the problem measured in some way. And so how many bytes you need to describe the instance you want to look at? This is the scaling of the time it takes. And this O hides n independent constants, which you don't keep track of because they vary. If you run the same algorithm on your laptop versus on a supercomputer, well, maybe the supercomputer just has a 10 times faster clock time. So the constant here is actually just 10 times smaller, as a simple example. So you don't normally write what those constants are. They do matter, and we'll see they can matter. If the constant is 100, that's a problem. Nonetheless, usually optimizing this behavior, how badly it scales, is the first thing to do, and then think about the constants. Actually, if you really go hang out with complexity theorists, they are happy so long as t is poly n, which means some polynomial in n. And as opposed to being super polynomial, or say the standard case would be like e to the something allen, if somebody tells you they have an efficient polynomial time algorithm, you shouldn't implement it unless they can tell you what that degree is. Because if it scales as n to the 10, it's just as useless as an exponential algorithm for actually doing anything practical. So typically, when I say algorithms before constants, you want to make sure your approach is asymptotically efficient that it scales well with the system size before worrying about shaving off those overheads, as I write here, going from 10n to 5n is good, but from 5n squared to 10n is way better, at least for big enough n. And usually, you want n to be as big as possible. So those are, I think, the three high-level rules of optimization, and if you follow them, you won't waste too much time. So now, we'll be a little bit more practical for numerical code in Python. There are a few general kinds of rules that you want to do for numerical kinds of codes. So the first is called vectorizing loops, which means you don't want to, if possible, write out explicit for loops, looping over all of the elements and say a big matrix like those matrices we were working with yesterday for a number of reasons. In Python, that's slow, and it's just it's slow to have explicit for loops. And if the for loop is over 10 things, it doesn't matter. But if the for loop is over 2 to the 10 things, then that's not a good way to implement your program. So we'll see what I mean by vectorizing. Even in other languages, being able to write things that are iterating over all the elements in an array in a vectorized fashion is actually important because modern CPUs, and Python will do this too, if they know they're doing the same thing to every element continuously in some block of memory, like in an array, we'll actually do it in batches. And if you write an explicit for loop, then the computer won't know to use those special instructions that allow it to do it in batches. And that also gives you big improvements in speed. So it's not just Python and MATLAB that it's a good idea to think in a vectorized way about. We'll explain that a little bit more. Avoid accidental copies. So if you have really big arrays, you don't want to accidentally copy them because then you waste a lot of time copying them. And if your memory constrained, you also waste a lot of memory. So that's why I talked yesterday about how certain kinds of slices into arrays were views of the array. They basically just were a little thing that said, I want to look at these sub elements but don't actually make a copy of those elements. And other kinds of slice indexing techniques created copies of the elements. So when you're getting into big stuff, you actually want to know whether you're creating copies and you don't want to do it by mistake. Contiguous memory is faster. This is, essentially, you won't need to worry about it. So long as you're writing things with Python and NumPy, it'll all be contiguous unless you know what you're doing and make it not contiguous intentionally. This is related to the fact that modern CPUs work on blocks of data again. And so having things sort of in the same place in memory will make them run faster. And then, especially in Python, use the high level toolkits that SciPy, NumPy, and other mature scientific libraries provide you. A lot of those have pre-compiled algorithms that are mature. Mature means hopefully bug-free. Doesn't mean bug-free. Just means hopefully bug-free and optimized. So, for example, SciPy provides a module which has a bunch of tools for taking fast Fourier transforms. Use it, even though you could implement your own fast Fourier transform. It's not that hard to implement. But the one in there has just been optimized. It's compiled. It will run faster than the one you are likely to write unless you spend a lot of time optimizing your own. So use those high level toolkits. Then, as I say here, this is the last resort. And actually, it's very rare you need to do it if you sort of follow these rules. The last resort is to identify, after all your benchmarking and all your algorithmic enhancement, changing algorithms is what changes these asymptotics of how it scales with time. Sorry, input size, the time scales with input size. But it is possible that there's some section of your code which you're sort of doing the right thing, but it is just too slow in a bunch of for loops you couldn't vectorize in pure Python and they're not compiled and they're not really fast enough and you think you can get a factor of 10 or maybe 100 from actually compiling that. And once you've identified that that's your bottleneck and you've identified you can't wait that extra factor of 10, which again, evaluate, you can go, oh, but I can make it faster or you can say, oh, I can throw it on the cluster over the weekend and I'll go to the beach and when I get back it'll be done and you only need to do it for your final few data points or something like that so maybe it's still not worth doing this, right? But if you really need it, then there are, it is very easy, although I won't explain how to do it in this lecture, it's easy in a broader sense, to weave in say one function call that you write, rewrite in a compiled language like C or something like that and just have that called from Python instead of your Python function which does the same thing. And so this says using for example, scipy.weave, scipy.weave unfortunately is no longer working because the guy who wrote it 10 years ago doesn't maintain it anymore and the community has decided that some of these other Scython and some of these other approaches to weaving things in are better. So anyway, I liked it, I used to use that but there are other ways to do it. And one nice thing about that is that if you have a working code base that's all Python and you can profile and use all the tools and debugging and all that stuff and then you decide this little block is the only thing I need to compile, then you can have a version of the code which calls it with the Python and another version you develop which calls it with your C version that you're developing and you can just write some test cases that make sure they do the exact same thing, right? And just say run it with this side, run it with this side for small instances, check they're identical and then once you're convinced you know that your C version or whatever compiled version which will usually look more complicated actually is doing the right thing. So that's obviously a more advanced technique but like I said, it's the last resort you really mostly won't need it if you can do the other stuff effectively. Okay, so any questions about these general principles? My philosophy, my philosophizing 9 a.m. Grandstanding, yeah. Will I show how to weave in some C? No, not in the lecture, no. Look, it's a last resort and the reason I say that is because it's a pain. It is not hard to do once you know how to do it but any tool you use to import an external language that you compile means you do have to learn something more about the way that Python and that language exchange data and you just have to go through and do it. I could, well, I don't have it prepared for the lecture. Once you know how to do it, it can be a very small block that just really is not hard to do and you run some tool, it compiles, it all just works and it actually does work and I've done it but there's a lot more you have to know about how to connect the two languages which is why you shouldn't do it unless you really have to. I guess once you've done it once then it becomes easier to do but then you also might start doing it before you really should. All right, so now let's, the rest of the lecture, we're gonna sort of explore this model, a classical 2D Ising model. Here it is, so this is a reduced Hamiltonian. These sigmas are plus or minus one spins. There is a ferromagnetic coupling and a longitudinal field, okay? So we're gonna do it on the square lattice. I didn't put one into the notebook. There is a square lattice and each site of the lattice has a spin which can be up or down. They interact ferromagnetically and I just wanna remind people that if H is zero so there's no field tending to push it in one direction or the other then this model at high temperature which corresponds to small j, small coupling since this is a reduced coupling this is one of our temperature, is paramagnetic and at low temperature or large coupling it's ferromagnetic and the critical coupling which you can get on the square lattice from duality I've just put here as stored it. J critical is log one plus root two over two, okay. So let's build up a Monte Carlo simulation of this so first we'll just sort of define a few global parameters. We're gonna have parameters L, X and L, Y to be the X and Y extent of this square lattice. I'm gonna say J and H are these two couplings the ferromagnetic coupling and the field. I'm just setting them to something right now J critical and zero. So it's quite natural to store the configuration in a 2D array because it's a square lattice and a 2D array looks a lot like a square lattice. So we are going to do exactly that so my configuration is gonna be stored in this array variable called sigma which is 100 by 100 away it's an integer data type so some of you might say well why are you wasting so much space four bytes for just an up-down value and the answer is because it's easier to write it this way and it'll run faster and I'll get the thing working and it won't matter and I won't have a problem that comes from not being able to get that down to a single bit for this study. Maybe there's a case where it would be but here it won't be, okay. Excuse me. So here we can take a look at that matrix so this is gonna do what? It's gonna build a figure. I'm treating the matrix of ones as an image but what I'm gonna do is tell Imshow that the range of the numbers in my matrix I should think of them as being range from minus one to one so then it will say one is the highest value and minus one will be the lowest value and it will show me nice pictures as we go, okay. And there is the all one matrix. So to run a Monte Carlo we have to be able to calculate energies so that we can decide when we make a move if the energy is going up or down and by how much whether we will accept or reject the move, right. So how do we calculate it? So here is one really explicit energy function. Take a look. How does this work? So I have a function, I've defined it. It's called energy. It takes as an argument sigma which is the array. I start, I'm gonna accumulate the energy in this variable here n equals zero. Now I'm gonna just iterate over all the horizontal bonds, vertical bonds and the field term to accumulate the energy. So for I going from zero to LX minus one which is the, let's see. So for this example LX is four, LY is four because I have four by four and this is gonna go I from zero on two to and not the three, right. Cause I have LX minus one and then for J over the full set zero, one, two, three add to the energy minus J times the spin value at position IJ and the spin value at the position to the right of it. So this is iterating over these three bonds on that first row and adding up their energies. Similarly, for the vertical bonds I have over all columns I need to only iterate over I guess three of the four rows give me the energy that relates IJ and IJ plus one. And then for the field there's a field term if H is non-zero acting on every spin in the lattice and so I need to iterate over all of them. That's my field term, that little squiggle, okay. Now, is it clear how this works? Yep, yes, so we'll get to it. So let's just see, I wanna make sure that this is very clear. Surely this is basically what your computer's gonna do whether you write it like this or you write it in a fancier way under the hood it's just iterating over all those terms, right. But this is calculating the energy explicitly. Now here, let's just see how long it takes to calculate the energy of this 100 by 100 thing. And I got 35 milliseconds. What do you guys get? 42, 85, anybody running on the server? What about on the server? 58, 58 milliseconds. Okay, so is that fast or slow? That's one answer. If you can make it faster than it's slow, what's another answer? Yeah, so 35 milliseconds. So that means that if I had to run this, do one energy calculation between every time I could say redraw this 100 by 100 picture, right. Then I would be able to get one second divided by 35 milliseconds is something like 30 frames per second for my 100 by 100 picture where all I'm doing is calculating the energy, not even making changes to it, okay. And so this is appallingly slow because your computer or your Xbox or even your iPhone can play like 60 frame per second games where you run around and shoot everything in sight, right. And there's like graphical blood and like things flying around and whatever. And you can't even calculate the energy of a little 100 by 100, also 100 by 100 pixels if you put on the display of your iPhone would be like a little tiny thing in the top left corner and you can't even calculate the energy in that time. So in this case, although I said you should get it working and get it working correctly before you start optimizing, I'm gonna say this is too slow, okay. I know it's gonna be too slow. So as I say, this is a bit long winded and slow. First, vectorize any four loops you can, okay. This version of the code. Is the same, we'll explain it in just one second and let's see how fast it runs. 67 microseconds, so I gained a factor of 500. Okay, that's a big enough constant. You might wanna do it right up front, okay. Because we know we're gonna call this energy a lot. So, well, how is this code working? This is vectorization. So, slices are fancy. You can do fancy things with slices, right. You can pull out sub arrays. And in particular, what is this doing? This slice is saying give me all of the spins in the, up to the last, but not including. Okay, row or column is a little confusing because you usually think of matrices with row numbers and column numbers, but if you read it as i, j in an x, y coordinate system, you have a transpose of that. So anyway, let's say that the first one is the row. So here, we're taking from zero to the second to last row, so zero to two, and every column, so all the spins there. So this would be in our four by four example, this would be a three by four matrix, a three by four array. And then we're doing element-wise multiplication with the three by four array given by the thing, the whole block just shifted to the right by one step. Right, and then this NP dot sum just takes a sum over the entire array that you pass it to. And then we just multiply the whole thing at the end by minus the coupling. So this is the horizontal bond, which was that first for loop. It's scrolled off the screen. The next line is the vertical bond. It's very similar, it's to make it very clear. It is multiplying that set of spins by that shifted block of spins element-wise, adding them all up, multiplying by the coupling. And then the last one is just the sum of the spins. So because I've done it in this vectorized way, Python, numpy, will optimize this and actually do this with very fast, close to CPU speed level vector multiplication. So this is what's called vectorizing your code. And generally if you have a for loop, which is running over the indices of some regular part, either the whole of the array or some regular part of the array and doing something kind of element-wise on it, there's usually a way if you think a little bit to reformulate that in this vectorized fashion. It takes a little practice being used to it. I'm sorry, what? This is integer, yeah. Because this is an integer matrix, this is integer matrix, the star binds first, then the sum, and that'll be an integer sum, and then the j will turn it into a float. No, no, no, no. So this, yeah, it works inside out if you want. There's no fancy execution rules that'll like typecast forward into the expression. So you can read it, you can always read what it's doing from inside out. This is time, well, this is take a view of this integer matrix, multiply it by this view of this integer matrix, then sum it, then multiply it by the float. Now, a small syntax thing. I didn't, you remember I said the other day, I could have a multi-line expression in Python. If I put a backslash at the end of the line I wanted to continue, and here I didn't put it, it's because it's inside of parentheses, and so I'm allowed to do that because it knows that the line is gonna continue. Anyway, that's just a small syntax note. Okay, so this bought us a lot. It's also very compact. Once you sort of learn how to think about how to vectorize things like that, it'll allow you to write it more quickly too, and actually less bugly, because there's just less code. Okay, so faster, more contract. Let's write an update step. So, this is the core of our Monte Carlo flip sequence. We're going to, here's our doc string, loop through sites, and randomly flip with probability set by the relative Gibbs weight, okay? So, here's the current configuration. We calculate its energy. Then, we iterate through all the sites for i and rangel-x, for j and rangel-y. Propose a flip, so sigma ij, that particular spin, times equals minus one, that flips it, right? Multiplies it in place by minus one. Then we compute the new energy, e1. We take the delta e, so e1 minus e0. We compute a probability according to the Gibbs rule. So, e to the minus delta e over e to the minus delta e plus e to the delta e. We get from the random number generator a uniformly distributed random number between zero and one, and if it is less than p, which means with probability p, we accept the flip, and set, well, okay, we don't even need to do this. The new energy is, we're not going to use this, but yeah, so new energy. Else, we reject the flip, and sigma gets flipped back. So here we flipped it, and here we flipped it back. This is a very explicit kind of implementation of the Monte Carlo. Okay, let's see how it works. So, what did it do? Remember, I think I have my j from above set to the critical coupling. And you see that I got an image, the current spin configuration. This tried at every site, so it was going to flip more than one thing. It's a whole sweep of the, yep, isn't that yellow? But which is red and black? It's presumably the default color map is set differently. So if you're, I think my, I don't know why, I might have set it in a configuration file on my computer at some point that I liked yellow and blue as the default color map if I don't specialize it, I don't know. But that's all it is. Because you have to have some map, the color map, which tells the computer, the image plotter, how to go from numbers arranged in someone, along some axis to a color scheme. And I think my default is probably just different, that's all. But you can see on this little block of Ising model that there's a few spins that got flipped after this one sweep. Okay, because I created a global when we started. So that's back here. Indeed, I'm, here it is, right? So I created a Sigma variable which is at the notebook level, right? And indeed, if, well, okay. So I created the notebook level. Also remember, when I call this MC flip Sigma in Python, all variables are references. So if I pass this array, which is the global one, into this function, actually it's a good point. This code maybe misleads a little bit about what's going on there. So this Sigma here is actually local. It's the local variable inside this definition. But I've passed it the one from outside. So it's referring to the same global array. And then I don't ever, when I do this, I'm actually changing that element of that array that I created outside. So the Sigma is always the same one array in this code. Yep, that's because it's all passed by reference in the language of C. Okay, so there's our image and let's see how quick or slow or whatever that is. So you see percent time it is very useful to get a quick feeling. And my Monte Carlo sweep took 791 milliseconds. Is that fast or slow? I see some thumbs down. I'm gonna be thrown to the lions. That's abysmal. It's comparable to a second. For one little sweep through this dinky little array on a modern computer. Come on, what year are we in? So this is not gonna work. I need to do, how do I know? I wanna do, I don't know, 10,000 sweeps on bigger things, right? And then the seconds add up, right? Actually, if I wanted to do 10,000 sweeps, how long would it take? About 10,000 seconds, but how long is that? What's that? Three hours. How many seconds an hour? 3,600? Yeah, about three hours. That's right. That's pretty garbage. I wouldn't wanna wait for that. So let's speed it up. So let's see, where is the time going? Okay, so let's use this percent percent P run magic. So what P run does, okay, two things to point out. So this is, it starts with a percent, so it's a magic command that Jupyter is providing us. And the difference between this one with a single percent and this one with a double percent is that this one applies to the whole cell. I could write a whole block of code and this thing would do something with the whole block. In fact, it would profile the whole block. That's what the P stands for. It's profile and run. This one just applies to the things on the line. So this will only time that one little call. Yeah, I think you can percent, percent time it and then have a block and it'll just time the cell. So let's, and probably we could have used a single percent P run and MC, but okay. I wrote it this way. So what did this pop out? It gave us this little view. And what it's done is told us how many calls to different subroutines were made. Some of these are gonna look very obscure because they're calls that are being made inside things that we're calling, right? How much time was spent in them? Total time, per call, that's usually small. And the, well, and they're ordered basically by how much time you're spending in it. So the total was in this case, 860 milliseconds. And we can see we have 10,000 calls to the energy which take cumulatively, cumulative time, 730 milliseconds. It's almost all the time. It's spent computing the energies. If I remember right, if you look at cumulative time and you add up all these numbers, it's way bigger than that, right? And in total time, in principle, if I add up all these numbers, it should add up to that. So the point is that the energy subroutine, or here, MCFlip, it's easier to see because we only call it once. In MCFlip, there's one call, oh, cumulative per call, total per call, yeah, yeah. So there's one call, it took 0.861. That's the cumulative time. But of course, most of that time was actually spent in subroutines that it called, right? So the amount of time that the profiler attributes to four loops that are inside the MCFlip regime, MCFlip thing, instead of in time in subroutines is the total time, right? So this is the amount, the 120 milliseconds, which can be attributed to the code around the calls to other subroutines in MCFlip. Does that make sense? So the energy, for example, is called 10,001 times, and you can see that there are a bunch of kind of obscure internal methods, which are called, well, here's the random number generator getting called 10,000 times, here's reduce some, these are getting called 30,003 times, so three times for each call. These are basically the sums which are happening, and per call, those are very fast, this sum, for example, is very fast, but the cumulative time spent on it is 380 milliseconds. The total time is the part which isn't actually going into reduce, I think. Okay, you can start sort of seeing how these calls go down. The main point when you're profiling those, usually you can tell because it sorts them by roughly the amount of time you spend in it that the issue is that we're wasting a lot of time calculating energy. It's right up there at the top, so that should go okay. Maybe I don't understand everything down here and I'd have to think it through and it's fun to kind of try to figure out why it went that way, but that points us to what we should think about, because, incidentally, you might have complained if I said, well, why is this slow? Oh, I didn't vectorize this explicit for loop I told you about, but actually that wasn't the thing that seemed to be taking very much time, right? So, let's go down. The issue is that the energy itself takes time of order n, right? Where n is the number of spins, because we have to walk through, essentially, three terms per spin. One is the field, one is the horizontal bond, and one is the vertical bond. And so that's gonna scale of order n if I make the lattice bigger. And we call it of order n times, because in every Monte Carlo sweep, we step through every spin, right? Which means that our MC flip is order n squared. Okay, it takes of order n squared time in order to do one sweep through the system. That's bad. In fact, you don't need the energy to be computed on every step every time you flip one spin. All you need is the change in the energy. And in a local model, the change in energy, if I make a local change, is local, right? Which means that I should be able to do an order one calculation that doesn't depend on the size of the system in order to figure out how much the energy changed by. This is why locality is really important. If you guys start trying to do Monte Carlo on some fully connected Sherrington Kirkpatrick, whatever, a spin glass kind of model, you're gonna find that you can't avoid some big exponents here. It makes them very hard to simulate because the energy updates actually aren't local. So, we only need Delta E, which is local. We only need the coefficient of the sigma term which we wish to flip in the energy in the fixed background of every other spin, right? So let's try to use that insight to improve our algorithm. Here's a new version of the MC flip code. And let's just take a look at it. Again, I'm explicitly looping through all the spins. Then I'm gonna calculate the local field. It is the field that that spin feels. So, local H, of course, it feels the explicit field, right? Then, I'm gonna just go to each of the four bonds. If I was calculating for this guy, I need to include the effect of this bond, this bond, that bond, and that bond. So, this is the horizontal bond, i plus one j, so j sigma i plus one j is an effective field on sigma ij. This is the horizontal bond on the other side, the one behind it. This is the vertical bond above it and the vertical bond below it. And notice these ifs here are just checking that I'm not on the boundary. If my spin that I'm looking at is on the boundary, then there isn't a horizontal bond to the left of it. So, that's what these if statements are doing. I don't get a field contribution from the left if I'm sitting at the boundary, okay? That is literally what open boundary conditions means, right? Okay, so then, okay, I did the sampling a little bit differently. I now know what the total local field is felt by the spin. I can compute the arc tanche of it to get the magnetization that spin would have in that background. The probability of being up or down is the magnetization plus one over two. And then I just set the spin randomly according to a number minus one or one thrown with that probability. So, this is, oh, there's names for the two different variants of the update rule that I'm using here compared to the previous one, but I've forgotten what they are. Whatever, this is also a perfectly good satisfies detailed balance according to the relative Gibbs rate. So, this now should take order n because there's a sweep, this for loop, I can just read off. This is a for everything in spin, so that's gonna be a factor of n times however long this takes, but this thing is now not n dependent. This thing, all the subterms in here take a time that's only order one. I time it and false is zero. Yes, that's using correct. And how long did that take? 73 milliseconds. How about for you guys? Same ballpark. We got what, a factor of 10? Something like that. So, this was an algorithm first optimization. This is now of order n instead of n squared. It's still slow, right? If I was running one sweep per frame and I was trying to draw an animation of our movie, 73 milliseconds means a frame rate of about 15 frames per second, which is jerky, right? Your computer should be able to do better. So, can we vectorize it? What other tricks can we do to make this faster? So, we can. There are, as I say here, kind of two tricky bits. There's the special cases on the boundary, which we, you know, in calculating the energy, we had to figure out how to deal with by kind of looking at sub blocks and moving them around a little bit. And if we're gonna vectorize it, it means that we want all of these random updates to be done kind of in one, in parallel, right? So, I don't do them sequentially if I wanna write a line of code, which is going to do them kind of all at the same time from the point of view of the code. Yeah, sorry, say it again? Well, the constant's changed. Yeah, the constant's important, and we're gonna improve it. So, the main thing, saying this is on our own squared, we changed our algorithm, we changed the code quite a bit. That doesn't mean that I literally get, as was just pointed out, a factor of 10,000 from this update. If I were to plot, I don't remember if I have this later, the amount of time this algorithm takes as a function of how big a system I'm studying, I would see one has some constant and linear behavior and probably has a little bit of an offset as well, and the other has a so and squared, but getting into the asymptotics of that, you have to look at how big n is and so on. So, there can be overheads, and they change, so we can't directly just divide by n. That's true, yeah? Well, what is the local field? So, I have, sorry, I used a capital H. Oh, but that's terrible. I'm gonna use a lowercase h. Okay, so what is the local field? It's the field felt by, say, some particular spin, sigma one. So, I look through this, I look for all the terms that involve sigma one, and I would find they all, there's a term here, which is linear and sigma one, and there's four terms here that are linear and sigma one. So, I'm gonna look at that as sigma one times, and then I have minus j, sum over neighbors first. Oh, is there supposed to be a u, because I'm in Europe? Europe, anyway, neighbors of one, I'll call them sigma j, minus, and then there was this term from here, minus h, right? And then, okay, then there's all the other terms, I didn't write, but this is the local field felt by this spin in a fixed background of the other spins. So, I'm holding these fixed in order to do the computation of that. Does that make sense? Okay. Okay, so there's some tricky bits to vectorizing this code. There's a special case on the boundary. Now, this is actually related to what I was just saying. We can, if I vectorize it, then what the computation will do, what I want to happen is that I write like one line of sort of tricky, slicey code, and the computer will iterate through doing random updates on every spin in that view of my spin field. But it has to do that in a way which is correct. I can only do that, I can only randomly flip a spin, according to this rule, in a fixed background. That's what we were just looking at here. So, I don't actually wanna do all of them at the same time because then the neighbors on the left would change by the time I flipped the next guy and the local field would have changed, right? So, I have to somehow figure out a way to do that holding the backgrounds fixed, okay? So, there's two answers to deal with this in a vectorized way. Well, there's probably other ways to do this. This is one way to do it. Set the boundary to zero and only update the bulk. You'll see what I mean in a second. And two, vectorize over sub lattices. Okay, let's look at the code. Okay, so this first function here, what this does is it just sets the first and last columns in the first and last rows to whatever value I pass it. And if I call set boundary zero, then it will set those outer elements of the matrix to zero. Now, the reason I'm doing that is because if I've set this to zero, then I don't need to special case this guy. When I multiply it by zero, I won't get a contribution to its field, okay? So, that's why I did that. And here, I'm defining an MC flip down here, which actually works by calling four times this MC flip sublatt, which is for sub lattice, with some indication of which sub lattice it is. And what this MC flip sub lattice does is it updates on the sub lattices defined by A and B, and randomly flip with probably set by the relative giveaway. So, the sub lattices are this. If I were to look at this spin, this spin, this spin, and that spin, then, and I randomly do an update to this one, and then I randomly do an update to this one, the local field felt by this one is unchanged by having done the update to that one, because it's buffered by a fixed spin, right? So, what I wanna do is I'm gonna pick an order so that I can in parallel compute all the local fields in a vectorized way in the fixed background of the guys who aren't in squares here, and then I can do the update for each of these, and then I just need to move over and do that sub lattice, that sub lattice, and that sub lattice. There's four sub lattices that have that property. That makes sense? See some nodding, yeah? Will a chessboard update do the same? So, let's see. On the square lattice, here's a chessboard update. What do you think? Who thinks it'll do the same? Who thinks it won't work? Wow, who's completely given up? Come on, no sitting on fences here. Learning is about reaching out ambitiously and then being smacked aside and disgraced, right? Okay, fine. No, it would work. Actually, it would be totally legit because again, I could calculate all the local fields for this guy, this guy, this guy, this, sorry, yeah, for these guys and do random updates to them, and if I randomly update this, it doesn't change the local field on that. That's correct. However, it is much harder to write a slice that looks like a checkerboard than a slice which is this one. Because this is just every other row in every other column, right? That's the problem. The slice that looks like a checkerboard is every other row, but it's staggered from one, or every other column, but staggered from one row to the next, and that doesn't have a simple slice representation. You can probably do it by even sneakier striding tricks, but it's not worth it. This works fine, and all we're doing is reordering a little bit instead of doing the checkerboard, we're doing, what does it really mean? It means that it's going like here, here, here, here, and then we'll move this over, and we'll do here, here, here, here, and so on, and okay, so we did them in a slightly different order than your update would have done, but I could implement it much more easily. Okay, so how does it actually work? So, local H, so the slice that we're interested in is, this is a fancy slice, the spins, this step size is two, so it's every other row and column, and we're going from one, yeah, right, yeah, it is a complicated slice, so the issue is that I have two things going on. This is getting very messy, let me just draw it again, add one little bit, make it just a little bigger, so you remember I said I had to deal with boundary conditions, and I did that by zeroing these terms in my array. I wanna update those, I want them just to stay zero. They're gonna come in as basically no effective field coming from the bond in their direction, right? So, when I do this sublatticing, I want to, for example, take this sublattice, compute the local field on those four spots, then update on those four spots, and then move over and do the next. So the slice I need goes from one, so this A and B are gonna be zero or one, and they're telling me whether I'm on the even or odd rows or the even or odd columns, and I need to start from the one row or column, so that I'm not on the zeroth one where this row of zeros and column of zeros is, right? And then, I go from there up to, the last column, because remember, the last column's not inclusive in a slice, so that won't include the column of zeros in step size of two. And now, what I need to do is calculate the local H on that slice of spins, and that is all of them field the global field, plus J times the spin which is to the right, so I just shift the entire slice over, so I shift this thing over plus one in the row. To the left, I subtract one from the row, so that means I subtract one from the start and stop. Here, the row is the same, and here the column is up by one or down by one. So is it clear how I constructed this expression? All right, you might have to stare at it a bit if you're not used to this. Then, I have this nice slice, so for my, well, if I did this six by six case, where because I've got zeros on the outer boundary, it's four by four inside, this would be a two by two array of local fields. I then, again, do the exact same thing I did up there, I calculate the magnetization from that array of local fields, I calculate an array of probabilities, and then I set the spins on those positions to random, I randomly flip them according to a block of random numbers of the exact right size, so that's half the lattice minus one, which is the, let's see, six divided by two is three, minus one is two, so that's the right shift of one for this picture, similarly for that, that's a block of two by two random numbers, and then this was the block of two by two probabilities, and this sets them to plus or minus one according to that probability rule. So this little compact bit of code is doing all of those updates on that sub lattice in those couple lines, because it's vectorized like that, Python can implement the iterations inside, basically at machine speed, as opposed to the four loops that Python interprets. So let's time that. Of course, I'm going through this as if it's so obvious, I spent some time debugging this and checking it worked and all that, none of that is in my professionally produced notebook for us to go through in this class, you won't get these things right the first time and that's why you have to check, did this do the same thing as the simpler version? So actually a better practice, maybe I should have changed it, is to actually create a new version of these functions and then make sure they do the same thing, given the same inputs, because this one is obviously a bit more complicated to interpret, so why optimization is hard. So, yep, okay, didn't hear? Say it again. So is this your applied magnetic field correct in your formula? You have only one, eight. Ah, so this is a number, this is an array, right? If I add a number to an array, it adds it in every element. The field doesn't care about what the spin is doing. The field, I don't know, so the, yeah, I don't know, to get the, the total energy is the field times the spin, but the field itself is just the coefficient of that. So this now takes 400 microseconds. That's beginning to get reasonably fast. What was it before? 73 milliseconds, right? So, factor of 200, something like that. Okay, that's pretty quick. Now, let's try animating it. Let's see if this works. And, oh, oops, I left in, I left two copies of this block by mistake. As I was debugging, is it gonna work? So this is computing 1,000 frames, setting up an animation, and embedding it in the browser. If it works, that'll be great. I left in, by mistake, two blocks. So we can take a quick look at this while it's computing, hopefully it'll show. So this was more or less just to show you can do it. This is a basic animation, there's other ways to do it. It's an animation library from Matplotlib. We create the image. This animate function, all it does is do a sweep, and then update the data on this image, set data sigma, so it updates it in place on the image in the figure. And this funk animation thing, basically takes a figure, an animate subroutine, a number of frames you want. This interval is actually not important because of the way we're doing this embedded in the browser, and allows you to output animations. You can, we're doing it in browser with JavaScript, and we can control it in the browser and watch the Monte Carlo sweep through our criticalizing model with open boundary conditions. Sorry, what was that? Yep, it has, does it not work on the server? No. Oh, it must be, okay, I apologize. I swapped out the animation backend because I thought this one looked cooler and I didn't realize it wasn't set up properly on the server. The server has been a little bit of a mess. So that animation object we created is capable of outputting to various other ways, like you could output a movie file if you wanted to by just to some kind of MPEG or something like that. The problem with animation and movies is you always have an issue with making sure you have the right actual interface code libraries available. So this can output to that, but maybe it's not installed properly on the server because I didn't install all the right libraries. Did anybody at least got it working locally? The simplest way to animate if you weren't working in the notebook is to just keep updating the figure, which you can do when you're working locally as well. So part of the issue with the animation is that we're doing it in the notebook and it's a little bit more fiddly to get it right. So, but anyway, our code works. I'm sorry it didn't work on the server. Did anybody else get the animation working locally at least? Woo-hoo, victory, partial, I'll take it. Okay, so that's it. I'm not planning to say anything more at the global level. I have a set of tasks for you guys to do to play with this code and try to improve it and do physics with it. So you can try to measure the correlation time of this Monte Carlo at several different couplings. There's some code below that you can read and try to understand to get some help with that. You can try to extract the magnetization, energy, susceptibility, specific heat as a function of the coupling J as you drive across the transition and plot it. You can do it as a function of system size and see if you can see sharpening at the transition. And here's a challenge more on the coding side than just doing physics, which is can you modify the code to use fixed rather than free boundary conditions? That's not very hard. What about periodic boundary conditions? That's a little bit more complicated, but also not that hard without slowing it down, right? You don't want to change a boundary conditions to completely change the scaling of your code, right? So yeah, and that's it. So I've given you some code here. So the magnetization, for example, is just the mean of the field at any given time at any given Monte Carlo time, the mean of the spin field, excluding the boundary spins, which are just zero. So you can see it. Of course, the total magnetization would be a time average appropriate over your Monte Carlo sweeps. And this run sample has documentation, but basically will keep track of the history of the magnetization and the energy and return it to you over some number of steps with sub-sampling. And then there's a bunch of code down here to help you with computing autocorrelations and looking at time dependence in the Monte Carlo. So that's it. I want you guys to play with it, do as much of this as you feel like, can go for it, and thank you.