 Greetings, FPGA friends. It's time to try some small optimizations to see if we can get the LUT count down. So one thing that I'm looking at, at least in the code, is all of these address plus ones, right? So if I search for just plain old plus one, I see that there are 44 of them. And if I search for minus one, I see that there are 31 of them. This is right here. You can see that we're subtracting one from X. That's just a 16-bit subtract. But there's a lot of address minus ones, a lot of them, many of them. And some of them are SP minus ones, which we should probably turn into address minus ones, given that we've seen that when you change the stack pointer, when you increment or decrement the stack pointer, that usually happens by sticking the stack pointer onto the address lines and then incrementing and decrementing those. So the other thing that I noticed is that if I look at the intermediate language of the CPU as it gets output, so the language that it uses is intermediate language, and it's based off of cells. And each cell has a function that it does, and it's got input and output parameters. So this is fairly understandable. We're doing an equate, or maybe it's just an equal, I'm not sure. The width here you can see is 8 bits, so A is 8 bits, B is 1 bit, and the output is 1 bit. Okay, I'm not sure what this is doing, but let's take a look at something else. Okay, here we go. This is a little more understandable. So here's cell add. So the input here, A, is 4 bits, input B is also 4 bits, the output is of course 5 bits, and there is where you do the connections. So this is kind of like the net list. So input 1 and input 2 go to A and B, and the output goes to some unnamed wire called $8. So essentially this is a 4-bit adder cell. So let's see if we can find the adder cells. So here's another adder cell. You can see it's another 4-bit one. This one adds 5 bits to 1 bit to make 6 bits. This one down here is 3 bits plus 3 bits. It makes 4 bits. Let's just scroll through these. And what I really want to see, okay, here's an 8-bit adder. Another 8-bit adder. Here's 8 bits plus 8 bits. Ah, here we go. Okay, so this is a 16-bit adder, and we're adding 1 bit to it. So this is probably the plus 1 thing, right? And in fact you can see that A gets hooked to address, and B gets hooked to 1, and there is the intermediate output. So there's a 16-bit adder right there. Here's another 16-bit adder. It looks like we're doing PC plus 1 here. Here's another PC plus 1, address plus 1, address plus 1, address plus 1, address plus 1. So you can see all of these adders are address plus 1 or PC plus 1. Look at all that. That's a lot of adders. Is there a subtract cell? There is. So we can see here there's a subtract cell with address minus 1. So what I want to do is I want to try to get rid of a lot of these cells and see if that has any difference in the output. I'm not sure what YOSIS actually does in terms of optimization. Maybe it recognizes that a lot of these cells are the same and it just sort of coalesces them. Who knows? So what I'm going to do is I'm just going to make one simple change. So if we look for address plus 1, we can see that there's a whole bunch of them. So what I'm going to do is I'm going to make a separate resource called a 16-bit increment decrement unit. And I'm going to pass all of these functions into that one single resource. So the theory is that there is only one adder and we just multiplex inputs and outputs into it. So let's see what that looks like. Okay, that's pretty straightforward. I define an enumeration for the function that I want to perform. Don't do anything, increment or decrement. We've got an input and an output and a function. And we just basically switch on the function and combinatorically input is plus 1, and that goes to the output, or input minus 1, that goes to the output, or do nothing, or if it's unspecified like for example function 3, just set the output equal to the input. That's all there is to it. So to hook it up into the core, I basically just instantiate one module of the IncDec16 unit. I set the default of the function to do nothing, and I set up the connections. So our function gets copied into its function, our input gets copied into its input, and its output gets copied into our output. So now let's use it. So here's our RTS instruction, and we can see that there are two address increments in here. So what I'd like to do is replace this with something like this. So this is the syntax that I'd like. So I want to do this on phase one to stress that this isn't combinatorial. And I want to set up the signals so that I take in the address, and I give out address plus 1. So if we go there, we can see that this is what I've done. So I just take in the signal, and I give out a statement. And combinatorically, I set up the function to be increment, I set up the input to be whatever you told it to be. And then the statement that I return is just that the IO is equal to the output. And of course, you could do that combinatorically, or you could do that on phase one. It's whatever you want to do. So now let's go ahead and replace the other one. Okay, that's all I'm going to do. And then to make sure that this is going to work, I'm going to run formal verification on RTS. Okay, so to hook it up into the core, I basically just instantiate one module of the ink deck 16 unit. I set the default of the function to do nothing, and I set up the connections. So our function gets copied into its function, our input gets copied into its input, and its output gets copied into our output. So now let's use it. Yeah, I did notice that error. And this error as well. And as we can see, formal verification worked. So I didn't break anything, which is good. As a baseline, the current number of LUTs that we're using is 3537, 3537, and the number of cells is 3895. So that's the number that we want to try to knock down. So let's take a look and see what we've got. Okay, so 3581 LUTs as opposed to 3537. So we've increased by about 40 or so number of cells 3939 as opposed to 3895. So we've increased by about 40 cells. So what's going on? Well, first of all, I think that, again, the place and route is a stochastic process. So it's basically random, and any small change in the input is going to result in a huge change in the output. So, you know, in fact, I don't know if my optimization really worked. It's unfortunate, but, you know, that's how it goes. Okay, so what I think may be going on is that YOSUS is pretty smart. So is N-mygen. And what it's actually doing is it's taking all of these basically networks of logic and it's comparing the networks. And if it finds two networks that are identical, it will simply merge them. So it's pretty much the same thing as what I've done with this, with ALU2. So I'm pretty sure that what happens is the logic for ALU2 is created identically to every other network of logic for ALU2. And the only difference, of course, is the function that's being performed. So I'm pretty sure that what YOSUS is going to do is it's going to look at the network as it's going to see that, you know, maybe there's only this one difference and it's going to combine everything and it's just going to have some sort of a multiplexer for this. So that's one theory, and we can test that. The other idea is maybe it's a particular instruction that's taking up a lot of logic. So for example, I don't know, CPX. Maybe if we eliminated the CPX instruction and instead replaced it with basically a no op, we would see a significant reduction. And that would mean that there's a significant amount of logic being taken up by just that instruction. So then maybe we can target that instruction for optimization. So let me explore that idea first. So here's CPX. Maybe what I want to do here is simply eliminate the CPX instruction. And obviously it's not going to pass formal verification, but I just want to see how much logic is eliminated by removing the CPX instruction. Okay, so just for fun, I've commented out the next instruction branch to subroutine. I don't think that's particularly logic intensive, but let's just see how much logic that takes. You can see that I've also put in a comment saying how many LUTs it uses. Obviously, the instructions are not necessarily orthogonal. So in other words, if I save 120 LUTs on BSR, that doesn't mean that if I eliminate both of them, I'm going to eliminate 240 LUTs. It just means that I'm going to eliminate a minimum of 120 LUTs and a maximum of 240 LUTs somewhere in between. So let's take a look at the amount of logic quote saved. Okay, so we went down from 35, 37 to 3300. That's interesting. That's about 130 LUTs. So let's go ahead and write that down, put it back in, and what the hell? Let's take a look at the next one and see what happens. Okay, so this resulted in a little bit of a savings of about 120 LUTs. So that's kind of interesting. So the hypothesis now is that by eliminating all three of these instructions, I will be able to eliminate a minimum of 130 LUTs and a maximum of 360 LUTs. So let's see what happens with that experiment. Okay, so there is a very significant amount of savings here. So this is something like 500 or so LUTs, maybe 550. So that's kind of at the top end of my estimate. So this is interesting because it sort of leads to the question of if I can somehow optimize these instructions, I may be able to save a significant amount. Now if we look at the CPX instruction, we can see that there's kind of logic that probably can't be reduced. In other words, there's no logic that's common with any other instruction. So for example, we're using these special functions in the ALU, CPX high and CPX low, and it's possible that by eliminating the CPX instruction, we also eliminated that path in the ALU because nothing else uses it. We're also passing the low value and the high value of X into one of the inputs of the ALU, and maybe that's not duplicated anywhere. So it's entirely possible that there are no savings to be had by attempting to optimize CPX because there's nothing to optimize. Let's look at BSR. Okay, now in BSR, it looks like maybe there also is nothing to optimize here because again, it's completely custom logic restricted to BSR. Now it's possible that BSR and JSR have some things in common that maybe you just didn't notice. But I'm kind of doubting that. Again, I think that Yosis is pretty smart about that sort of thing. In fact, let's split this file and find JSR. So here's JSR. We can see that there's actually two modes. There's extended and there is indexed. Okay, and let's see what happens during cycle one. Okay, so during cycle one. Okay, so one commonality we can see is that in JSR, cycles four and five are identical to cycles three and four. So again, presumably Yosis noticed that. Cycle six is the same as cycle five. Cycle seven is the same as cycle six and cycle eight is the same as cycle seven. So in terms of the difference, it's really at the beginning where we're just reading mode extended. So there really isn't any difference there. In terms of indexed, again, it's the same thing. The cycles are pretty much identical. The only difference is the beginning. So okay, maybe the hypothesis is incorrect and Yosis actually did not see similarities between these two instructions. So let's see if we can maybe try to combine the instructions. Okay, so what I've done is I basically just renamed the JSR function as JSR BSR. And if the instruction was a BSR, just do the BSR stuff. So that's a very minor change. In terms of the instruction decoding, we just call JSR BSR for both of these things. Now, I don't expect any change whatsoever based on this. I mean, there may be a little bit of a change due to the stochastic nature of place and route. But let's just make sure and see what happens. Okay, I'm actually a little bit stunned because I didn't think that would actually have any effect. It turns out that we've saved about 140 LUTs, which is kind of weird to me because all I did was move some logic around. So I don't really understand that. Okay, let's maybe try a different kind of experiment. So these are some really small functions and the only real difference between them is the flag that you're actually setting. So again, I think that Yosis is probably smart enough to notice that these are happening on exactly the same cycle. So what I'm going to do is I'm going to write a function that just combines all of these and we'll see how many LUTs we can save there, if any. So this is my combination. Basically, because I've already done the switch case on the instruction, really the only thing that I need to do is check these two bits. And there's the default case, which is just the other instruction. So let's just see what happens. So this is the change that I've made in the decoding part. So just call the same function, see what happens. Okay, so that just sort of increased the number of LUTs. I guess, you know, maybe adding a second decoding level did that. So there was really no savings to be had. Okay, so here's another possibility. Here's increment decrement X and increment decrement S. They are, for the most part, the same, or at least they should be. They're slightly different in terms of, of course, the register that they increment and decrement and the fact that increment decrement X additionally sets a flag. But you'll notice that I've used different code for each one. So maybe what I should do is just try to combine these two functions so that they're common. I mean, they take the same amount of cycles and they basically do the same thing. I can use the address lines as I did in increment decrement X in increment decrement S and let's see what happens. Okay, so this is the first phase. I've made both of them look identical except for the register that they use and for the flag setting bit. So let's just rerun synthesis and see what happens. Okay, well in doing that I've eliminated 14 LUTs, so that's something. Okay, so a second thing that I've done is I've changed the instructions to use the address lines to do the incrementing and decrementing. So you can see that here we're just storing back the address lines into X and for S of course we're storing the address lines back into SP. Otherwise, cycle two is absolutely 100% identical. Cycle one, the only difference is the register that you put into the address lines and cycle three. The only difference is the register that you store to and the fact that you're setting the flags. So let's see if we can save anything from there and the answer is yes, we ended up saving about 80 LUTs. So that's actually pretty interesting. So that combined with the 14 or so LUTs that we saved by the additional change that we made, total about 120 LUTs. So that's pretty much a win. So maybe if we could look for other instructions that we can have the cycles turn out to be absolutely identical, maybe that is a hypothesis that we can use. So here's another interesting thing to try. So the ALU instruction takes these two basically hard coded integers, either zero or one. And if we look at the definition of ALU two, we can see that we're using a multiplexer based on those two integers. So I'm wondering if that actually adds anything to logic. And if we sort of change this to something a little more hard coded, maybe that would work. Okay, so I went ahead and replaced those multiplexers with an equivalent Python expression. So basically that either is going to double the amount of logic used by ALU two or it's going to save something. I have no idea. So let's take a look. And we were at 3367. And wow, we're at 3278 now. So we saved 90 LUTs just by apparently doubling the logic and then maybe letting YOSIS find the doubled logic instead of actually explicitly using multiplexing. That's interesting. So let's try to find some more multiplexers. So I'm pretty sure that something like this is really not eligible for replacement. And the problem is that B itself is not a constant. In fact, B is based on the instruction itself. So I don't think that that is valuable to change. And unfortunately, all the multiplexers are like that. So they're not based on a constant. So that's the end of that investigation. One thing that occurs to me is that it didn't really make a whole lot of sense to me that just replacing this multiplexer by a Python expression. And to my mind, doubling the amount of logic because now you've got, you know, one bit of logic that does exactly the same thing for operand one set to zero. And then a completely different set of things for operand one set to one. Maybe what's happening is that by splitting the logic up, we're allowing YOSIS the opportunity to do better optimization. So one experiment that I thought that I would do is try to eliminate this quote runtime or synthesize time expression. Because basically what I'm doing, I think is I'm probably forcing YOSIS to do something in particular that maybe isn't really the best thing to do. So what I'm going to try to do is split up all of the ALU2 instructions in order to sort of hard code the mode instead of putting the mode in the actual synthesized logic. So let's just see what happens with that. Okay, so I've replaced those synthesized time ifs with compile time ifs. So the thing is that now I have to pass the mode bits for all the instructions. And I've done that here. So I basically have four blocks of instructions where I only had one. And of course, now the matcher doesn't have any don't cares in them. So we can see that I'm basically explicitly setting mode bits A when these two bits here are zero. When they are one, I'm setting mode bits B indexed and extended. So those are all the instructions that are the ALU2 instructions. So let's see what happens now. Nope, that was a bad idea. That increased the number of let's buy 110. Nope, revert, revert. Now all of these experimentations, they're not going to waste. So when we run an optimization that we think will work and it actually doesn't. In fact, it makes things worse. Let's just call that pessimization. Pessimization is actually the opposite of optimization. So if we can find code that's pessimized, in other words, we make a change, we find out that it's worse. So if we could find code that has that change in it and attempt to change it back, then we've learned something. So here's something that we might see is already pessimized, right? So we're taking the cycle and during synthesized time, we're comparing it to self dot cycle. Well, maybe we can optimize that because we can see that cycle is passed in as an integer. So one thing that I might possibly do is, you know, maybe I've prematurely optimized here. And what I can do is just find all places that read by it is used. They're only six places and just replace it with this code except, you know, just set the cycle appropriately. So here's an example of doing that. This is in all of the ALU instructions. After we read the one byte address for mode direct, we go ahead and read the byte that comes out of that address. So by substituting in the code that was in read byte, so now we are reading in the address is the operand and we are putting the result in source bus two. Now I can just get rid of this and hope that somebody, somebody somewhere will see an optimization that can be done. And I'll do that for all the read bytes and see what happens. Okay, so I have replaced all uses of read byte with their equivalents. And as you can see, there's only one usage left, which is the actual function itself. So I can just get rid of that if I needed to. But what I want to do is run the synthesis and see how many LUTs I've saved or not saved by doing that. And the answer is I didn't save or lose any LUTs, which I guess sort of makes sense. The logic is pretty much exactly the same. The interesting thing is that I removed a synthesized comparison, but apparently that got removed by the optimizer anyway, which is pretty cool. In any case, I don't need read byte anymore, and I never really liked it anyway. Oh, hey, remember the thing that we did by combining JSR with BSR? Maybe I can combine jump with branch. Now, I know that branch is conditional. There is one condition that's branch always. So maybe I can get something out of that. So here's basically the same thing that I did with JSR BSR is I've just added an additional if statement. Here I just need to change that to an LF, and it just copies the code into that if. Oh, bad news. That didn't work. That was a pessimization. So I guess we didn't really learn all that much because we inverted the same pessimization for JSR BSR and it worked. So you got me. So with all those optimizations, small as they were, I decided to run formal verification again. And I did get some instructions that failed. One of them was the increment decrement S. Maybe you were yelling at the screen when I did this, but in fact to decrement is, let's see, DES is 3-4. So INS is 3-1. So in fact, the decrement line is actually the opposite. So maybe we've lost some optimization due to that. Unfortunately, we've had a bunch of other failures, including things that I didn't think were very related like ink deck. I mean, I'm not sure what I did to cause that to fail. I suppose I could understand the gate test and calm because those are all ALU type things shift rotate. That's kind of weird. So I'm not sure what I did wrong. I guess I'll go look at some traces. Okay. Do you see the error? Yep. That was supposed to be operand 2. So that probably saved a few gates. Maybe not a whole lot of gates, but maybe a few. In fact, I would like to synthesize this and see how many gates I've lost. So that explains basically negate, test, and calm, I believe. So I basically lost all of the progress that I made, and in fact, I made it even worse. So before I changed the multiplexers to those Python statements, I had 3278 LUTs. No, I'm sorry. So before I made that change, I had 3367 LUTs. So in fact, by changing those multiplexers, I ended up losing. So I am simply going to revert that optimization, which turned out to be a pessimization. So I guess the lesson is put the logic in the synthesizer or it gets the hose again. And that error would have also explained shift rotate because that's all ALU2 and increment decrement, which is also ALU2. So basically all the ALU2 instructions failed when I made that error. So now that I have corrected the error and replaced back the multiplexers, we are back to 3344, which is pretty good. It's okay. It's a little, it's about 190 LUTs from where we started at the beginning of the video. So one thing that I'm finding pretty interesting is that, you know, I said that we've so far saved about 190 LUTs. Now that's compared to 3500 LUTs in total. So in other words, we haven't even saved 10% of LUTs. So we've probably saved about 6%, which isn't great in my view. 10% is kind of my threshold for what actually makes sense. Sure, you could have five optimizations that each saved 2%, but that's still a lot of work compared to one optimization that can save 10%. So I'm still looking for that one 10% optimization and I'm not really finding it. I think one of the commenters pointed out at one point that I'm not even using this source bus thing with, you know, source 8.1 select and so on, because I felt that that was premature optimization. So I'm wondering if I even need this code anymore. I'm pretty sure Yosis is going to recognize this as dead code, but I'm going to get rid of it anyway. Okay, well, that's interesting. We managed to save 30 LUTs by removing that dead code. So I guess Yosis's dead code elimination either doesn't quite work as well as we thought it did, or I've introduced a bug. Let me run formal verification again. Okay, everything formally verified, so great. I didn't break anything. It took about 20 minutes to do. So one thing that I noticed is this statement right over here. This is a Python statement. And maybe I can save something by turning that into something that is actually synthesis. So let's try it. So this is part of the ALU block of instructions, except for this thing, which is not an ALU instruction. So this is the block of ALU instructions right over here. I didn't miss anything, did I? No, doesn't look like it. Okay, so the only places that store are false, the only places are compare and bit. So compare is 0, 0, 0, 1 and bit is 0, 1, 0, 1. So maybe I could just remove store equals false and set it up so that store is basically if it's not 0, 0, 1 or 0, 1, 0, 1. Let's try that. Okay, so I've just changed the name of the argument to xStore and I'm not going to be using it. And I've defined a store variable that basically checks if the bottom four bits of the instruction are 1 or 101, which are compare and bit, so they do not do a store. And I've replaced the ifStore with just a synthesizable ifStore, here, here, here and here, four places. So that's a really simple change. There is an error. What's the problem here? Oh yeah. Right, so these are now, I can change those to xStore as well. They won't have any effect, but let's see what happens. Interesting. I think we made some headway. So we were at 33.44 and we're now at 32.77. So somehow I managed to save about 70 Lutz. So that is kind of interesting. Let me see if that is the same thing in the ALU instructions, the ALU-2 instructions, I should say, because we do have this other store. So maybe I can save something there. So here is the only place that store is false, so it's for the test instruction, 101. Oh, and when I previously said test, I guess I meant compare. Unbelievable. So we've actually ended up increasing. Okay then. So I've been looking at the code up and down, left and right, mainly up and down. And I just don't really see anything obvious that I can target for an optimization. Again, it's kind of difficult because you can't really, it's not like, like, you know, with C++ optimization, say, for a microcontroller where you can look at the assembly language and you can sort of see, oh, okay, well that's what it's doing. You can't really see a whole lot, even with the intermediate language, because again, there's, you know, YOSIS goes ahead and processes everything. And MyGen does a little bit of optimization for itself, but YOSIS does the majority of the optimization. So the only thing that I can do is kind of trust YOSIS to do the right thing. And when everything is said and done, I've managed to get down the number of LUTs from 3537 to 3277, which is 250 LUTs or so. That's not even 10%. So again, I have spent a lot of hours looking at this, trying different things, and I just cannot achieve 10% optimization even. And that's kind of like the limit where I say, well, if you can't achieve 10%, you know, you're wasting your time. That's just a rule of thumb that I have. Anything under 10% is just not significant. So I think I'll basically just call it a day. I've made a little bit of progress, but not as much progress as I was hoping to make. And I'm just going to leave it at that and just call it done. So I suppose that the next thing to do would be to check out some of the other signals that I've missed. I know that there are some input signals that the external hardware can request. So, you know, maybe the next thing that I'm going to do is look at some of the other signals that I skipped, mainly halt, that's right here, and three state control and data bus enable. So I've skipped those. Those are actually input signals, which do other things to the processor. So I'm going to have to add those. When that's done, then maybe I can put it on an FPGA, hook up basically an eProm and RAM, and just put together a tiny little processor and, you know, run it through a few small programs and see if it works at 3.3 volts. And then I'm probably going to take a look at making it five volt tolerant. So I think that's about it for this video. Not very happy, but it was an exercise that had to be done. So after several more hours of chasing micro optimizations around, I decided to look at what are the possible options that Synth Ice 40 does, which is one of the things that YOSIS runs. And I'm looking through the options and then I noticed that there was this list of things that the synthesizer actually does. So I'm looking through this and I'm like, yep, there's some optimization it does. There's another optimization pass. It does a whole bunch of optimization passes, which is nice. Then I saw that under map lots, there was an additional run of ABC, which is a logic engine or a logic reduction engine. And then there's another pass of an Ice 40 optimizer that runs only if the minus ABC2 flag is set. There's also some other things over here, which is another optimization that gets turned on if you turn on minus relot. So I looked at minus ABC2 right over here, and it is actually not set by default. Same thing as relot, it's not set by default. So only if. So I kind of decided to just add that to the code, which you can see right over here. It's an option that you pass into the build function under the synth ops keyword. So there's relot and ABC2. And I was able to reduce the number of LUTs by about 200 just by adding these two command line options, which is kind of interesting. That doesn't mean that all of my micro optimizations were pointless. No, in fact, even if I use these options, a lot of my micro optimizations were still valid. So in the end, every micro optimization would result in a reduction of 10, 20, 50 LUTs or so, and it added up. So in the end, we started with 35, 37 LUTs. And after all optimizations, including these command lines, I ended up with 28, 74 LUTs, which is a significant amount of LUTs because it's over 10%. It's something like 650 LUTs, which is, you know, definitely over 10%, you know, pretty much almost 20% even. So I think that was probably, was it worth the amount of time? Well, I mean, it took a week, many, many hours just trying different things, some optimizations which would seem to be a good idea resulted in increasing the number of LUTs by 200, which is insane. So I put some comments in the code about, you know, maybe where I tried some optimizations, like, for example, here is a place where I made one optimization, which actually saved 60 LUTs. Just by eliminating this one statement where it wasn't actually used, where it shouldn't actually ever be used or ever be triggered. So that was weird. When I tried the same sort of optimization on right over here, well, it didn't work and it resulted in an increase of 200 LUTs. So, you know, maybe it had to do with the fact that, you know, maybe this code here is a little bit bigger than this code. So it turned out that the optimization was actually a pessimization. I have no idea. The ways of the optimizer are mysterious and you should generally try not to do any optimizations beforehand until you're finally done and then start, you know, seeing if some micro-optimizations will work. Now, the interesting thing is that if the optimizer ever changes, then this code may actually increase in LUTs or decrease in LUTs. It all depends. So, again, is it worth the effort? I would probably say it's on the margins. Again, I was able to save almost 20% LUTs. And unfortunately, it did not make a difference in the chip that I would have to use. Because I'm under 4,000 LUTs and I'm above 1,000 LUTs, I have to use an HX4K or an HX8K but an HX4K. If I could get the LUTs down to below 1,000, then I could use an HX1K. But that's pretty much impossible. So I think I'll call it a day here. I think I'm not going to look at any more optimizations. You're welcome to try and submit maybe a pull request for an optimization that you found. But the key is that the optimization still has to be readable. I don't want any obfuscated C code contesting here. So I think that's probably about it. Probably the next videos we'll deal with actually putting it onto the ICE40 board. And maybe seeing if I can level shift some of the 5 volts down to 3.3 volts. Or maybe just running some test programs and getting some LEDs to blink or something. I'm still kind of suspicious that this whole thing just seems to work. And I kind of suspect that maybe I missed something, but I guess we'll see. How's that, Cat?