 Hi everyone, I'm going to be talking about the Truffle Debugger and how it keeps track of mapping keys. So this is the thing that the Truffle Debugger does right, so the Solidity language famously does not know what a mapping keys are and in the debugger when we are displaying variables at a given point in the transaction we want to be able to show what is contained in the mapping. It requires knowing what keys are and well, that's not possible in general but we can at least keep track of the keys that were used in the given transaction. And we can even do this when we've got, you know, nested mappings like here, I've got this function that does a bunch of crazy things with mappings nested inside other things but if we run this in the debugger and we skip to the end, it's all there, you've got this map inside this struct, inside this array and you've got all this. But that's actually, the nested mapping thing, that's actually not what I want to focus on right now, let's close that down. I want to focus on something that looks a lot simpler. Let's look at this test. This sets up a bunch of things in mappings that look very straightforward. We've just got, you know, got Boolean mappings, some bytes mappings, some integer mappings, an address mapping and, you know, if we skip to the end, just a moment there, we can see it's all there. The surprising thing is that, let me close this down, the surprising thing is that keeping track of all this is a bit more complicated than you might expect. Well, so I mean the first question is how do we do it at all? And the basic answer, which I've sort of typed up on the right here, on the left I've got the relevant part of code which I'm afraid is rather opaque but you'll see there's an, listen, so the basic version is that, you know, so the trouble debugger, in order to tell where we are in the code, right, it uses the source maps that Solidity provides, right? We have the bytecode, we have the source map, we know where we are in the bytecode, we can find where we are in the source and then we can, the other thing it uses is the AST, the abstract syntax tree representation of the source that Solidity provides. So if we have where we are in the source, we can then map that to a particular node in the abstract syntax tree and we can therefore use various information about that node. So the basic version of how this works is that whenever we are on a node and are processing the node and it corresponds to an expression, we store the top word of the stack as that node finishes and we associate it to that node as well as to the current stack frame and similar stuff. I said the top word of the stack, some types, you know, use up more than reward on the stack so really it's the top n words of the stack as appropriate, one or two. And then whenever we get to an index axis for a mapping or we're accessing one of those mapping values, right, we decode, we look at the expression that is the index and we decode it and then we say, okay, that's our mapping key, let's associate it to that mapping. So this is pretty simple and it works fairly well. It's a, like I said, it's a pretty good system that Nick came up with, but there are some wrinkles in it. And so, first off, what do I mean when I say associated to that mapping? Here's, come on, yeah, you see I have sort of text slides here, wasn't mean when I say associated to that mapping, an early version associated to math is AST node, that obviously has various problems. So, okay, so I mean associated to the address and the storage slot, how do we get the storage slot, same mechanism, we pass over the node that is the corresponds to the mapping itself that we're indexing into, right, we store the top word of the stack, and you know, that's our pointer. And we actually do a bit more than that in order to keep track of some other information, but I don't want to go into that here because then I would not, then I would not have time for the rest of the talk. So we have this basic version. And now we start getting to the wrinkles and the reason that handling all these things, it's not actually so simple. So oh boy, I'm kind of about to spill the secret sauce here, I guess, but hey, our debugger is an open sauce, so it's no secret, anyway. So what about these string literals here? That doesn't seem like a problem, they're just string literals, right? Except, the thing is, think about the process I described for how we handle these things. In order to get the information for the index node, we have to step through that node at some point, there has to be some actual EVM instruction in the bytecode that is source math to that range of the code. And for some nodes, that doesn't happen, and string literals are an example of this. I say string literals, I mean also hex literals, right, I mean those are considered by Selen to be essentially the same thing. And so what's our solution to this? So let me actually go into the code here, into the code mapping, please, Saga. Oh yeah, we've got this tiny bit of code here for cleaning booleans to handle out-of-range booleans, make sure they can treat it as true, I'm not going to go in any more into that. And oh, by the way, you may notice, why is there a loop here? We're decoding mapping keys, right? It's just like what I said doesn't involve any sort of looping process. Why is there a loop? We're going to get to that. But one thing at a time, so in order to handle string literals, we've got the special thing, we're before the main case down here, this here is the main case. But before that, we've got this special case to handle string literals and actually some other things, if you go ahead and read the comments I've written there, where if our index is a simple constant which includes string literals, we don't use stack at all, we just read the information straight out of the AST. This object here is one of our internal pointer objects that we use for indicating data locations such as a place, a spot on the stack, a spot in memory and whatever. And in this case, it represents not a spot in memory or storage or whatever, but a spot in the AST that we are going to read the data out of. Now let's just handle string literals. So here's our revised process. We're going to do what I said every time we process a node, we're going to put the top word of the stack or the top two words and then when we get the index axis, we're going to read the data off the stack for the index and we're going to decode that. And that's going to be our mapping key, except for string literals, but then we'll read it out of the AST itself. The next problem is what I already said, not all types take up just one word on the stack. That's easy enough to handle some type of two words on the stack. I already mentioned that, let's get it. Okay, so we've got a revised version of what I said, but there's another problem. Decoding pointers, so I skipped over this before. I said, oh, let's just decode the data from the stack. Now, what if it's a pointer to memory or to storage or to call data? Well, by itself, that's not a problem. Our decoder knows how to handle that. It knows to go look up the appropriate information in memory or storage or call data. But there is a little bit of a problem with making sure it knows the right type here. So let's, if we look at the code here, we see this crazy thing where I've got a spliced definition. Sorry, some of these names are a little out of date misleading. The definition here might be a better thought of as a node. We've got a spliced node where we pick two different AST nodes and kind of splice them together. And why on earth would we do that? Well, we wanted to code according, right? So we wanted to code, right? Obviously, Solidity is a statically typed language. If we wanted to code data, we need to know in advance what type we're decoding. And if we have a mapping and it's keys or strings, it will be reported as it's keys being in the AST. We'll say it's keys are memory strings. But we might be giving it to a pointer to a storage string or a call data string. Now, in general, we wanted to code according to the type of the mapping keys, right? Not the type of the particular expression because there might be an implicit conversion, right? So we wanted to code according to the type of the mapping keys. But if the type of the mapping key says memory string, and we're actually dealing with a storage string or a call data string, we're going to get nonsense. The decoder is going to re-pointer as entirely the wrong type of pointer. It's going to look entirely the wrong place. So we have to do this sort of crazy splicing thing where we take the key, the node for the, that defines the mapping keys, and splice onto it the location of the actual index expression. That's where we've got this splice location function here. All right, all right, we've got that. We've got that. We've handled that. So we've got our revised process here. We're going to decode the word from the stack, but the string literals, we're going to read the value from the AST instead. And also, we have to do the splicing thing. But what if the key is a constant state variable? Because remember what I said earlier is that in order for our process to work, key has to be source mapped to, at some point, there must be some actual, right? We have to actually step through that key. And for constant state variables, this doesn't happen. Now, if the constant state variable is part of a larger expression that's the key, this is fine. But if that is the entire key, it's just going to get skipped over and it won't work. So what we have to do is here is the case for handling constant state variables. And this one is kind of a nesting nightmare to some extent. If it's a constant state variable, what we have to do is we have to look up the definition of that constant state variable. We have to look at the node where we have, we have to look at that index node. So this is a constant state variable and its definition is over here and look at that AST node and then we can read the information out of that. Assuming it's a fairly simple constant anyway. If you do something crazy in your constant definition, we might run into some trouble and we might fail to keep track of the key. But I'm assuming that's a fairly minor case. Regardless, I like to be comprehensive. So we can handle, you know, most constant state variables. Okay. Okay, you've got, we can handle ordinary keys, handle pointers, handle string literals, cancel constant state variables. What if the key is a hexadecimal literal, and the key type is a bytes and like what if it's being used as a bytes and rather than as an integer. Which, you know, is something that I, you know, I made sure that's here and now like why would this. Why would this be a problem at all. Well, here's the thing. The hexadecimal literal, from this point of view, that's an integer. So when we put the key, when we put the raw key data on the stack, it's going to go on the stack as an integer, left padded, right, right aligned. So if you do this with zero XFF here, it's going to go on the stack with zero XFF on the end here. But this is a byte, not an integer. And those are right padded, left aligned. So the actual value of the mapping key that is used this so if we were, if we didn't have any special handling. We would get, we would put on the wrong mapping key and we would, I mean, we'd be looking in the wrong place. I mean, you know, the decoder would just be like, it would just look at this first byte here and be like, Oh, I guess the mapping is zero X zero zero. And that's not correct. Oh, we go back to the code here, you may notice, you may have noticed a while back when I was talking about string liberals the code here talks about it's not just for handling string liberals it's for handling hexadecimal liberals to and there's some special stuff in the decoder so that it knows that when it's reading a. The hexadecimal liberals really covers any numeric literal that but it knows when it's reading a numeric literal from the AST. So both string and numeric liberals will read them out of the AST instead of from the stack. And it knows when it's doing this. That if it's decoding as a bite, then it has to shift it appropriately. Okay. Okay. Can I ask you questions. What can ask a question. Yes. Back to the slide. So you said that the constant is put right aligned onto the stack. Is it then discarded when the mapping location is computed. No, of course not. Obviously, it has to be left shifted in order to be used but okay. Okay, then yeah. Yes, it's not. Yes, you're right that obviously it has to be left shifted in order to be used so obviously goes on to the stack is that at some point, but the question is does it go on to that as a stack at the point you can capture. This actually gets into some stuff I skipped over. So normally we assume we get to know that the last map structure for that node that the next thing on the stack is what we want. There's actually some cases where that's not true because it goes into unmapped code. And so we have to look ahead to the next math instruct math instruction. When I say map that unmapped I really mean like, you know, map to minus one you know in solidity internal stuff and so it's actually possible that this could be have been handled by that system. I haven't checked that that system actually came later chronologically. So I never thought to check that. Regardless in this case reading it out of the ASD is how we could do handle it. It's possible we could handle it instead by looking a little bit ahead instead, but I haven't checked right but I mean the problem has to be the problem is that it has to be at a point that we can know. Aha, this is where it is. So, yeah. Anyway, there is one remaining problem. And you'll notice, I still have not explained the answer to the question, why on earth is there a loop here. So, let's skip down to the bottom of this loop. You will see why there is a loop here, or a better way of saying this is the example so this address map. And for a while what would happen is this first assignment, this first assignment would work fine. We'd keep track of time. But the second one wouldn't. I thought is there a problem with addresses. No, the problem is nothing to do with addresses. The problem has to do with certain type conversions address this doesn't work as a mapping key, or didn't it does now obviously but it didn't work as a mapping key, because the address conversion from contract at the EDM level it's a no up. There's no actual EDM instruction that corresponds to the address conversion. So there's no instruction to get source map to it which means once again no value gets stored. So, and you can see this with other type conversions as well as well so like, by one, that would work fine because that type conversion involves a shift. This is sort of, but address this or one, those are no ops. So, the problem would occur. So, this is kind of the opposite problem of the hexadecimal literal problem right so this. This problem is where there is a conversion in the source code, but no actual instruction in the EDM, whereas the hexadecimal literal there was a conversion instruction in the EDM, but no corresponding node in the source code. And fortunately that's the only type conversion type type conversion with that happens but there's several sorts of type conversions where this happens, where it's at the EDM level it's a no up. So, how do we handle this. Well, if our decoding has failed thus far, and the node type for the index expression is a type conversion. We have to make sure that it has failed this far because if, you know, if it's a type conversion that isn't a no up, then we don't want to do this. In that case, you know, things will work already. So, but if it's a type, if it's, if it's, if things have failed thus far and it's a type conversion, then we look inside of, we look inside the arguments that type conversion, and we try again. And similarly with unary plus if you're using really old solidity is we will look inside the unary plus, and then we will go back the top of the loop and try the whole thing again. And that's how we're able to handle these no up type conversions. And so, with this revised that's why there's a loop and mapping key decoding and with this revised process with all these wrinkles. We are able to handle and keep track of basically any mapping key you can throw at us, except maybe really complicated constant state variables but that's not a common case we can handle pretty much any mapping key you can throw at us and that is how the bugger does it.