 I think we're all in pretty good shape. We have, everybody that I've looked at has a vector called GCProfile, which has these values here. We find that this is relatively well written. If we want to change the resolution here, we just change a single number and we can plot things with higher resolution. Now Tammy had mentioned that we could look at a histogram, which is now, since we have that vector, supremely easy to do. All we need to do here is run a histogram of GCProfile and that's what the histogram looks like. So indeed the frequency is the highest around the 50%. The bulk of that distribution is something lower, so the average would be away from 50%. As we've seen, of course the average should be around 0.47 and there's a long tail to the high values and in these blocks here we have regions of GC with a GC content of higher than 80%. So that's good. We'll need histogram a little later. Now the initial thing that we were wondering about is what are the dinucleotide frequencies? So if we want to look at dinucleotide frequencies, we need a vector of dinucleotides. We could do that on the fly by concatenating individual pairs of nucleotides as we go along, but I think we'll actually make a vector of dinucleotides, but how? Can you use any of the principles that we've encountered here to just make a vector of dinucleotides? Let's not go for anything elegant or crafty. Do it in the simplest way possible. No, because we don't actually have the dinucleotides there. So the table would count the elements in our large vector, but these are just single characters. So actually we want a table, we want a vector of, first element should be AG, just AG. The second element should be GT. The third element should be TA. The fourth element should be, what was it, GA and so on. So let's make a vector of dinucleotides, but we'll probably have to play around with that for a little bit. So let's make a small vector, call it X, and play with that to develop what we're doing here, 120, fair enough. If we can do it for 20, we can do it for 100,000. OK, now, dinucleotides. That's one way to do it. Yeah, we can take the same approach as I from and I to, and paste together the elements we have there. Right. Any other approaches? Yeah, why not? I mean, this is something we've used before. I wouldn't write it in that way, but it's something that's, you know, fresh in our minds and it certainly would give the right results. So what should the ifrums be, from the first to the second? We want all of them, right? Yeah, the last one. To length X minus Y. Now, when you write something like that, you have to be very careful. You'll see what will happen now. Because what we have here is something that is a question of operator precedence. Look what happens here. So this starts at zero and goes to 19. That's not what we wanted. We wanted to do something that starts from one and goes to 19. This is a frequent bug that people make, especially when they write for loops. So what operator precedence does is it does this one first. And it generates a vector from one to 20. And then it subtracts one from that vector, which gives a vector from zero to 19. That happens a lot. So what you have to do is you have to put this in brackets. You want this evaluated as the single value 19, i.e., length minus one. And then this expression becomes the expected one to 19. So be very careful. When you build a range of indices here, use brackets if there's any kind of arithmetic operations. Otherwise, you will get the range first and then whatever arithmetic you do, apply to the entire vector. All right, so our i from is from one to 19 and our i2 goes from. So if the first nucleotide in i from is one, the first nucleotide in i2 has to be two. So i2 is two, two, and six. Now we need to build. Somebody has their mother calling. What is that? Do you need to get this? All right. So now we need to store the results somewhere. And this will be a character or something. How long is it? How long is it? Length of i from. Now we can iterate for i in one, two, length of i from. And then we get the value. So dinucleotide of i is crafty. Now we need to put the two characters together. So the two characters are x, i from. But how do we combine them together? Paste, paste. And we need to set the separator to this here. So nested expressions like that are quite frequent. We could have pulled that apart and assigned it to intermediate variables like A, B, and then paste x8 and xb or something like that. That's sometimes preferable to help structure the code, but not necessary. So this is fine. We can usually just as a sanity check, I do something like 7 and then evaluate. i from is 7, i2 is 2, 3, 42. That's not what we wanted. So what is the problem here? A problem I never executed this. So this is still the old block version. So let's try it again. 8 makes a lot more sense. So pasting these two together is ga. And comparing this is 1, 2, 3, 4, 5, 6, 7. These two together, this is position 7 and position 8 is ga. So that looks OK. So we have a, g, g, t, t, a, a, g, g, a, a, g, and so on. Overall we have 19 dinucleotides, which is the expected number. And we can immediately see that each dinucleotide ends with the nucleotide, which the next dinucleotide starts with. Right. So now we can get the counts with table. But actually what we did here is we used x. And I actually need my ck. OK. Head and tail tells us that this is kind of what we expect. And now we can get the counts using table. Simply table all right. Does that look reasonable? If they would be all equally distributed, how many would we find in each category? How many different dinucleotides exist? 16. Right. So if it's just all random, how much would we expect? 100,000 divided by 16, 6,250. So it kind of oscillates around that. Some are quite a bit higher. Some are quite a bit lower. But remember the equal distribution would be something that we would only expect from if AC and GNT are equally frequent in our nucleotides. We already know that we have a GC contents, which is varied. And this is around 0.47 and not around 0.5. So if we would simulate these frequencies, we would need to take that into account. So we would expect less GC than AT simply because the frequency of G and the frequency of C is less than the frequency of A and the frequency of T in our data. So whenever we need to evaluate whether when we do an analysis, whether what we do is different from what we expect, we need some kind of a model of what the dinucleotide, what the nucleotide frequencies would be. Now one thing to do that is to set up a simple simulation from first principles. We can build a loop where we choose A and T with the frequency with which they occur in our genome and then randomly combine these things as AT, GT, whatever dinucleotide appears and thus make a synthetic approach here. But an alternative approach that's actually very good is to work with shuffled sequences. So if we take a stretch of nucleotides and we shuffle them randomly and then we do our dinucleotide count, that tells us something about the distributions that we expect if there's no information in the position, if everything is just random. And that's actually really easy to do. In R we have a function called sample. And sample is purably versatile and used and needed all the time to generate random background distributions which we need for a statistical evaluation of significance. You see, when we do statistical tests, normally, you know, T tests, F tests and no one's what have you, usually we expect that the underlying population distributions are in some way normal. In biology, as you know, nothing is further from the truth. None of the data that we look at is just normal. You've seen ROTC distributions. That's not at all what we would expect from random sampling. There's information everywhere. And in order to properly take that into account, it's often better to calculate something that's called an empirical p-value, simply running a simulation and then counting how often our observation exceeds the test statistic on a particular simulation rather than trying to apply a particular probability distribution which may not be applicable because we're looking at biology. And information, evolution, selection is everywhere. So this is why tools like sample that allow us to shuffle sequences are really useful and really important. Let's look at this. So this is the numbers 1 to 5. If I just apply sample to a vector of that kind, I receive a permutation of the vector, i.e. Now, that's really funny. This is a random result, but it just turned out to be the result in the order. Something like 2, 1, 5, 4, 3, or 5, 4, 2, 1, 3, or 1, 2, 4, 5, 3, and so on. Or why not? 1, 2, 3, 4, 5, as we had it here. So if we give it a vector, it will permute that vector. We can also make vectors of samples, for example, of length 3. So sample 1 to 5 of length 3 gives us 4, 1, 2, or 4, 2, 1, or 3, 4, 2. Now note that this is not like rolling dice. This is sampling from a burn without replacement. So in these samples, we get only the values that were in our initial vector. And if our initial vector has unique values, we get unique values back. And for example, if we request seven values from that vector, we get an error because, by default, the parameter replaces false in the sample function. We could do replaces true. And now we're doing this like rolling dice. 4, 5, 2, 2, 2, 5, 3. So the 2 is repeated three times here. Incidentally, this does not need to be numbers. We can sample from a vector of letters. For example, a and which represent our nucleotides. And then we get a vector of seven randomly sampled nucleotides, or 100, or 100,000. So we could use that to generate 100,000 randomly sampled nucleotides. Now that would give us equally distributed nucleotides. And our nucleotides are not equally distributed. A and C and G and T are not 25% each in our sequence. If we run sample in this way, we would get 25% each. We can set a parameter, probes probabilities, which defines the target probabilities for each of these numbers. But here, to calculate random frequencies, we'll actually do something different. We'll take our mySeq, 100,000 nucleotides, and we'll permute them. If we permute them, then the nucleotide composition is going to be exactly the same as the input. They just change place. So we know that then the nucleotide composition is going to be exactly the same as in our input. So this makes it superbly simple. There we go. MySeq ran as 100,000 elements, but in a different order. And now I could copy this and paste the dinucleotide table a third time. And I shouldn't even have copied it the first time. That's not the way we do things. So whenever you find yourself copying code, that's a red flag. You shouldn't do that. You should write a function instead, even if it's in your script. So we'll write a function instead. We'll just take this entire thing here. Call it dinucleotide table is a function which takes a sequence vector x. Paste all that in here. iFrom is the length of x minus 1. i2 is the length of x from 2. Dinucle is character, length iFrom, and so on, and so on. And we return, and there we go. Now we have a function called dinucleotide table. And it doesn't have a header, which is awful. And it doesn't define what the return value is. And there's no code comments. And you shouldn't do it that way. But anyway. So in this case, once the function is defined, we wouldn't even do this here. Our variable dinucle is now better calculated as nuke table of my seek. I shouldn't call this dinucle table. I'm sorry. We're not getting the table here. We're just making the vector. So let's change that. Let's remove it. Dinucle, we should call this dinucle vector. So it's a large character, and so on. Head dinucle, that's the thing we need here. And now I don't need to copy and paste and change my seek in three different positions. But I can just say my seek rand. I actually don't even need to produce my seek rand. I just say dinuc vector of sample my sequence. So I don't even need to produce that. I can just pass the result of sampling my sequence into this dinucle type vector. There we go. So let's do the same thing. Head dinuc rand, tail dinuc rand. And I'll show you something really useful. So I need to change these two variable names. So what I can do is hit the Alt key and then select two or more lines at the same time. You see how I have a very long, tall cursor here? And I can type along the long cursor. And I don't need to do it twice or delete things along here. Well, this is just two lines. But imagine you have 20 lines. This is a real time saver. Sure, can you do that? I didn't see that shortcut. OK, so if you click Alt and then drag, your cursor spans a number of lines. Lauren, is that the same way Alt key on Windows? Yeah. OK, so Alt and drag. And then you can just type into many lines at once. I often line up my code so that things are vertically well aligned just for that reason. It makes things that I need to repeat and I need to repeatedly change much easier to type. This feature, incidentally, is why I sometimes, if I need to edit data files that aren't in the right format for me to use and read, I sometimes edit them in RStudio and not in the text editor or whatever. Because that feature just makes some things extremely easy and convenient. OK, so we have head, dinook, rand, and tail, dinook, rand. And yeah, that looks reasonable. And I'm running out of imagination here. So I'll just call tops and trand as two tables. OK, we have tops and trand. And these are the frequencies. Yay. And now finally, we want to plot this as a bar plot. Well, the easiest thing is bar plot, t. There we go. Now we'll refine that. First of all, one thing that we'd like to do is to arrange our columns here from largest to smallest. So we need to do some kind of sorting. Maybe I'll just post the code here, which would make it easier. So that's actually very simple. I'll just call this code snips.r. I put that function in there. So dinook is myseq, dinook vector from sample myseq, tops is table dinook, trand is table dinook, rand. Incidentally, I always end my script files with a little comment end as the last thing in my file so I can guarantee that whatever I have, my file is complete. Now if you adopt that, the key is you always have to do it. If you don't do it once, this will look like a truncated file. So especially if you're working in a group and some group members have that and some group members don't do that, that's not working. So if you can't get everybody to work this way, then don't bother. But if everybody can cross their heart and wish to die and put in this little flag that will indicate to you that your files are actually complete, that is potentially useful. So I save this as code snips.r. I go to my version control. I go to commit. You shouldn't actually ever need to do this, but I'm just demonstrating to you how I can get something to the master repository from here. I select code snips.r to stage it for committing it and then uploading it to the master repository. It needs a commit message and sharing code. And then I click on push. And now this is up in GitHub. And what you just need to do is go here and click on pull branches. And when you do that, just try that, click on pull branches. In your files, the file code snips.r will appear, code snips.r. And you can click on that and it'll have that code. So I think I'll adopt that. Whenever I do something a little more complex, I'll just throw it into that file, upload it, and then you can download it. And this may save a few of the red posters going up. It's a bit of a double-edged sword. If this induces you to rely on the code being posted rather than you understanding what it does and being able to write it on your own, I'm not helping you. So it takes discipline. So however, with that code and your own efforts, you should be able to come up with a table called t-ops and a table called t-rand, which has the information about the dinucleotide frequencies. Or actually, I should say, this is often misstated. These are not frequencies. These are counts. Frequencies are counts divided by number of observations. OK. So now we've already seen the bar plot of t-ops, which is very simple. As I said, there's a couple of things that I would like to do with it. First of all, we should convert counts to frequencies. In our case, the counts for the reference and the observed distributions are the same. But that is not necessarily the case. For example, if our observations are for 100 million nucleotides, and in order to save some time for whatever simulation we're doing, we're just simulating 1 million nucleotides, then the counts would be very different. So we reduce them to frequencies by dividing over the total number of observations. Secondly, what are these bar labels? Well, they're dinucleotides, right? They're taken from the names which we find in the table. Why do we not see all the names? That's not nice. So this is what dinucleotide. This is what dinucleotide. We don't know, because it doesn't show. Why doesn't it show that? Well, because it doesn't have enough space to print them. So what we need to do is we need to reduce the scale, the size of the axis labels, so that we can actually print them and see them all. So that's one thing to do. Another thing is, well, this is just a random ordering here. I would like to sort them so that we have the tallest one at the left and the lowest one at the right. And the final thing I'd like to do is to show in the same bar plot what the expected frequencies are, kind of like the plot that we looked at way, way earlier. When we began this morning, our target plot would look something like this with the legend. I don't like the position of the legend. Why is it so odd now anyway? OK, so that's a number of things to do. Let's start with the frequencies. That's very simple. So instead of, well, we can do two things. We can convert our tables to frequencies, or we can change our bar plot to plot frequencies instead. Let's go for the more explicit thing here, and let's just say bar plot t-obs divided by some t-obs. It's the same bar plot, but now that y-axis scale has changed. I could also have written 100,000 here because, in fact, or rather, 99,999 because, in fact, that is the frequency. OK. Now, if we do the same thing for our t-rand, we get that. First of all, our plot for t-obs is not here anymore, but we have a new plot. Secondly, if you happen to catch it, the y-axis has changed. The y-axis here goes from 0 to 0.06, and it goes to 0.08 here. If we would plot it like that in some way, we would actually be comparing two plots on a different scale. When I'm in a student committee and I see people plotting essentially the same thing with different y-axis scales, I try to be kind, but it's not good. So if you have comparable plots of comparable information on the same slide or in the same image in the manuscript, always put them on the same scale. Otherwise, your plot's on misleading. Now, both things can, in principle, be addressed with a parameter of bar plot, which has a lot of parameters to begin with, which says add is true. So this specifies if bars should all should be added to an already existing plot. So what does that look like? Plot one, plot the other. You kind of get the idea, right? So now we've put these bar plots here, and those other bar plots exist, but we don't actually really know which is which. And for the values like these here, what's behind it is obscure. So there are ways to draw bar plots of plot side by side, but not in base R. You need a special package for that. And it's generally also not considered to be a really good way of working with data. But now here in this case, one thing that we can do is to use color, and in particular, to use transparent color. So we'll digress for a little moment on how color works in R. So let's add a color here, which one do I use? OK, 0, 0, 7, 0. So I've got a colored bar plot. Now you look at this probably in some confusion. How the hell does this odd sequence of characters specify a kind of royal blue? And the solution is actually very easy. There are a lot of named colors that you can specify. In R, you can get them all if you just type colors, lots. I mean, colors you wouldn't even have thought of. Powder blue, and salmon, and peach puff, and really peach puff in different shades. Papaya whip, misty rose, medium turquoise. I don't even know what misty rose is. So let's see what misty rose is. It's a valid color that is specified by name in R, so we can bar plot color as misty rose. Misty rose, OK, there you go. But in order to build color gradients and color ramps and tweak colors, I find it is a lot easier to actually think about the composite red, green, and blue values that make up a particular color on our computer. And that's what these color hex codes specify. So a color hex code always starts with a hash character, and then it has six values. The first two values specify the strength of red in the color. The next two values specify the strength of green in the color, and the next two values specify the strength of blue in the color. So these six values specify red, green, and blue. But they do that in hex code, in hexadecimal. So in our decimal system, we have numbers from 0 to 9, and we have 10 numbers in that range. And whenever we exceed that range, we need to add another digit and then go on. In hexadecimal, we have 16 numbers. So they go from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F. That makes 16. So F is the largest value in a single digit in hexadecimal code. So 0, 0 in hexadecimal is exactly 0 in decimal. And F, F corresponds to 255 in decimal. So the hexadecimal value F, F is 255 in decimal. So 0, 0 to F, F makes 256 different values. So with two digits of hexadecimal, you can get these values. And let's look a little bit at how these compose. So all 0 is, of course, black. No red, no green, no blue. What's this? All red. What's this now? Yellow? No red, all green, all blue is cyan. If we want yellow, we need all red, all green, no blue. That gives us yellow. It's actually simple, isn't it? So how do we get gray? Some mid-gray. Gray is kind of halfway between black and white. So something intermediate with the same values in every position. So let's say something like 8, 8, 8, 8, 8, 8. That's gray. If we want a lighter gray, we add a little bit to that. So let's say C8, C8, C8. Once you've got a little bit the hang of that, it's much, much easier and much, much faster to specify colors this way and work with colors than to have to remember, is this gray 5 or gray 7, or do we call it Misty Rose or Peach Puff or Petrol? And it also liberates us from using the standard red, green, blue, yellow, magenta, cyan that we otherwise use, which make for the absolutely garish default R plots that you sometimes see posted and sometimes in literature, where you immediately tell that either the person was color blind, or had a very difficult childhood, or not somebody prone to aesthetics of data. We'll probably talk a little more about color. It's really important. You don't underestimate, if you ever want to get something published, how important it actually is to make beautiful plots with that. It just immediately, if you see that somebody has taken care with producing their plots, your reviewers are going to be much more likely to trust your data analysis as well, because it's just an indication of the care that you're putting into your work. Now, this doesn't mean you should do shorty data work and then just fluff it up with spiffy plots. On the contrary, you should actually take that to heart, put the same amount of care into your analysis that you put into your plots. Anyway, so this is how this color was generated. It's a bar plot with a kind of a royal blue. It's got a lot of blue and a little bit of yellow, a little bit of green behind it. And yeah, let's use a different color here. That color is, what did I choose here? Something nice, DD0055. Now, it already becomes a lot easier to distinguish which is our observed and which is our reference plots. But they still overlap. Is there not something we can do about that? And the answer is yes. When I said that colors in R are specified as a triplet of red, green, and blue, I was emitting a small part. You can actually also specify something else. And that's transparency. So if we add a fourth digit, so let's do this again. This was, let's use an orange here. So this is an orange, full red, and a little bit of green. And now we can make this partially transparent by specifying a fourth set of numbers, where 0 means completely transparent, and ff means completely opaque. And something in the middle means something in the middle. So the high value, like AA, is partially transparent. See the difference? That's like 75%. A value of something like 33 is mostly transparent, much paler, a value of 11 is almost invisible. But if we now add transparency to our bar plots, things look a little bit different. So we add, let's just say, 5, 5, 5, 5, transparent blue, transparent red. And now we can actually see the overlaps. OK, now one thing that had bothered us is the size of the access labels. So a parameter that we want to change here is the font size of the names. In any kind of plots, changing font sizes in R is going to have something to do with a value that's related with CEX. So it's a scaling factor. What we want to set here is CEX names. So cex.names equals, say, the default is 1. And so let's set it down to 0.3. Tiny, tiny, tiny, but they're all there now. But we don't need them that tiny. Let's make them a little bit larger. Let's try 0.67. Oh, that's too large. Some of them get swallowed. Still some get swallowed. Trial and error here. So that seems to work at that scale and resolution. It kind of depends how large the plotting area is. Usually at home I have a larger monitor so I can get away with larger font sizes, which is all moot if we plot our add plot here. So we actually need to turn that off. So the thing to do here is access names is false. All right. So that gets us a little closer. Do you see what could go wrong with this access names is false? They could be different, right? That's a very good point. Some point in when you do this plot, you actually need to verify that the ordering of the tables that you're looking here is the same. Specifically, if we change the ordering. So for example, if we change the ordering so that we order them by decreasing, numerically decreasing values for the observed plot, we have to be very careful to use the exactly same ordering for the reference plot too to make sure that we're actually comparing apples with apples, i.e., the same nucleotides with the same nucleotides. So that leads me to somewhat related our functions which are both important and are often confused. One is called sort and one is called order. Now sort is relatively straightforward. If I sort tObs by default, I go from smallest value to largest value of the vector. If I want to go the other way and go from largest to smallest, I need to set the parameter decreasing equals false. Sorry, true. True, of course. It's already decreasing. So now I go from that ordering here. Now if I do the same thing, I also get from largest to smallest. But exactly as you said, if I simply plotted in that way, I would be overlaying the TA bar with the TT bar and the AA bar with the AT bar. It would be absolutely meaningless. That's a cargo called plot. We're not comparing the same things at all anymore. So we have to have some way of ensuring that the order is defined and stays the same way in both of our plots. So this is where, again, subsetting by some index vector comes to our rescue. We can define the names vector, a names vector, which arranges the values which we have in a particular version. If we apply the same names vector in both of our plots, we guarantee that we're seeing the same results on both sides. So for example, if we get the names of the vector in this sorted order, we get TT, AA, CC, CA, CT. And that's exactly the names we have here. And we can assign that. And if we choose the values in T ops by these sorted names, we get them in the right order. But obviously, if we do the same thing for Trand, now they're not universally decreasing, why would they? But they're in the right order of TT, AA, CC, CA, and so on. So by defining a vector of names in a particular way and then applying the same vector to both of our plots, we guarantee that the bars that we're overlaying actually correspond to the same dinucleotides, like so. Of course, we don't have to do this for this evaluation of T ops and Trand because we're just taking the sums. So this is in decreasing order. This is what we have in our shuffled nucleotides. All right, interpretation time. What does this mean? What do we see here? What does this plot teach us? Oh, actually, adding a legend here. So we have observed values, and we have randomized values for particular dinucleotides. They're kind of similar, but not all the same. So remember, the individual nucleotide frequencies are exactly the same in our two data sets. There's exactly as many T's and exactly as many A's in our shuffled data set than in our actual nucleotides. But we see more TT's and we see more AA's than we have in our shuffled nucleotides. What does this tell us? Well, the way that I would interpret it is that if we have a T, there's a higher chance than random that we will have a run of T's. If we have an A, there's a higher chance than random that we will have a run of A's. So this says we don't only have A's and T's, but whenever any occur, there's a chance of them being part of the polynucleotide of T's and A's. So that's on the high end. Similarly for C's, slightly similar for GG's. So apparently in real genomic sequence, one of the non-random features is that we have runs of nucleotides that are all the same. What could the reason for that be? Probably replication, skipping of polymer races as you're replicating sequence that the polymer race tends to repeat the same base in a loop over and over again. So this would be the result. So simply from looking at this, I would predict as a molecular biologist that there's potentially a mechanism for that would expand single nucleotide sequences through polymerase skipping in replication. And now I can go to the lab and try to figure out, can I do an experiment to verify or disprove this hypothesis? But the largest and significant difference is on this end. So what does this mean here? The rightmost bar. How do you interpret that? First of all, what does it mean? What's more? What's less? You're seeing less of the observed. Right. So this dinucleotide appears much less frequently than we would expect simply from the GC composition or the composition of nucleotides in the sequence. Anything striking, interesting about CG? It's not the same as GC. It's very different from GC. Interesting, isn't it? So GC is randomly distributed. But CG is not. It's biology. Have you ever heard of CPG? What's CPG? It's like a repeat that causes these relations. Yeah, these are regulatory sites. So CPG, of course, means the C and phosphate energy. So CPG is a CG dinucleotide. So there's a mechanism here that says CG's or CPG's in our sequence are much, much less frequent than we would expect by random chance. Well, obviously, because they have a regulatory reason. They're not, you know, if they arise from any kind of a mutational event but they arise in the wrong place, they're selected against. So there's evolutionary pressure against CG's in random places. They're localized in particular places. And now if you're so inclined, you know how to do this. You can now go and check for the local dinucleotide frequency of CG's and plot that along the 100,000 nucleotides. And you'll probably see peaks in there that correspond to regulatory segments. But this is already striking. There's an evolutionary mechanism there that removes CG's, CPG's in the wrong places. So that's a lot we can do with only 100,000 nucleotides. I promise you we can do even more. But that's something for tomorrow. We'll end for today. No, we're not going to end for today. Lauren and Greg have prepared. Just Lauren. Just Lauren has prepared an integrated assignment that I believe is focused on a GG plot, which is a different way and a different philosophy of plotting in R, which is quite cool, quite nice. And we'll start with that, Ed. So that gives you a little bit of time to stretch your legs, maybe grab something to eat, but not enough time to really wander too far. So when will we reconvene? At 5.30. At 5.30? And it'll go how long? Open, Ed? Yeah. Midnight. Midnight. Greg will be here until 2 in the morning. Remember, Anna has already warned you, if you wander out to the washroom and you don't take a washroom pass, you can't come back. But right now, because a lot of you will be going to the bathroom, I will be at the door. I'll let you guys back in for the next 10 minutes. And then after that, I will check every so often, OK? So I'll see you back here in Good Spirits tomorrow. Digest that. Think for yourselves what you've learned, what was important to you. Not just what you've learned, what was important to you. In my courses, I've started making my students write one page of insights for every course, where they simply note down things they found annoying or things they found cool or things that they can emotionally relate to about the course material. And this is actually really important. Not so much, I read these things and I'm curious about what I read there. But it's important for you. It's important for the students to basically recapitulate in their mind what just happened. What did we go through? That's an important part of the learning experience. You may have questions. Actually, I expect that you have questions when you look again at the code and we'll have time tomorrow morning to go over some questions and clarify materials. If there's something that's absolutely not clear to you, don't let that pass ever. You are here and you're taking that opportunity to pick our brains and make it work. It's designed so that it can work. So don't be intimidated. Right, and I'll see you back tomorrow.