 Hi, I'm Nick. This is Harry. Hello. We work on truffles. And I'm the debugger guy. Well, you'll see the slide again when I get through it. Then we're going to talk about the current state of smart contract debugging. We're going to talk about debugging data formats. What are they? What do they do? Why does every other computing platform use them? Then we're going to talk about some of the design challenges, coming from a solidity perspective, but for smart contracts in general. Then we're going to make this interactive. So the idea is hopefully we'll break into groups and kind of talk about what would one of these debugging data formats look like. We'll see if people are not too afraid to share their findings. Then we'll talk about specific areas of work, like what are the specific challenges, not from a high level, but specifically break into groups again and present findings again. No one ever wants to open up to you, so great. Hopefully not. Um, I'll just try it. And then we'll conclude. So, you'll see how this all goes. Did that already? Alright, the thing almost does scan the carton. Bones down, bones down. Great. What's debugging like on Ethereum today? Well, we have solidity source mappings. These are extremely useful. They are fantastic. You have some byte code. Solidity gives us byte code. By codes, we can model it as a list of instructions. Then solidity gives us these source mappings which correspond to a particular byte code and we think of it like an array. There's an array of instructions and there's an array of source maps. And for each of the entries in the source map, it corresponds to the one byte code, you know, the corresponding byte code instruction and it tells you the file and the source range and some other stuff, but I won't get into that. Then what you can do is you can take this source map, you can take the byte code and you have some running program that you're looking at. You can get the trace, you know, you look at the execution of your smart contract transaction, you see everything that happened, all the op codes. And you get this concept called a program counter. Program counter is where am I in the currently executing context? What you do is you take the program counter, you convert, you figure out which byte code instruction that refers to. So if you have, you know, your program counter of 10, maybe that's wherever it is in the byte code, you find out what instruction number that is. Then you take that index i or whatever, you go to the source map and that tells you what you need to know. So that's pretty handy. Trouffle's earliest version of our debugger, that's all it did. It's translated from the low level EVM op code instructions up to the high level source. This is what all of these debuggers do. These are the three debuggers that I am aware of for Solidity. I don't think there are any other high level of hearing languages that have debuggers today. Apologies if I'm missing something. It's hard work to build a debugger so I don't need to leave anyone out. But we have Trouffle Debugger. I'm just gonna have to list that one first. But Remix has a debugger and the Meadowsuite has a debugger for gradient.net. So what do these do? Well, they figure out how Solidity works through trial and error. That's... Yeah, okay, well Solidity does document quite a lot, like storage allocation. But the program does not figure things out from trial and error. No, you built the program, figured things out from trial and error. Yeah, so inside Trouffle's Debugger how do you figure out what a variable is? Well, you observe it in the wild. And the reason for that is because Solidity has no way of telling us. So hopefully we'll change that. More to add? No, I was wondering. Great. So, of course, it's not like Solidity gives us nothing because then we truly would be pretty stuck. Fortunately, it does give us the abstract syntax tree. And so, you know, for those who are now familiar, you parse the code into a structured tree which contains all the information and it's much more machine-readable and markable rather than giant string. And so we use this for a number of things. And in particular, we use it for locating the variables. We use it to determine where the state variables are located. And we use it and we combine it with the source map, right? So the source map tells us where we are in the source code considering the source code as a string. But then we combine this with our knowledge with the abstract syntax tree that the Debugger, sorry, that Solidity compiler provides us with which includes these source ranges on each node. And we use this to determine not only where we are in the source in terms of string indices but also in terms of what node we're on. And so, using this, we can do a bunch of processing to determine where the local variables on the stack are and also other things. But note that, of course, we're not just relying on the abstract syntax tree. We're also relying on our particular knowledge of how Solidity stores data, which is not supplied to us in any format. That's just stuff we programmed into the Debugger data. Yeah, figuring that out slowly. Yeah. How's that sound? Good. You want to build the Debugger? How's it work on other platforms? Like, when you compile a C program, how do you debug that? Well, there's this concept called a Debugging Data Format which comes from the compiler. The compiler provides the information about a particular program and says, well, it gives you that translation, right? You can, given any machine state, here's how you can get the high-level source code state. So it's like, you have a bunch of bytes. You have a stack. You have memory. You have storage. Here's how you can get variables out of that. And this is extremely useful. There's a number of these. There's Dwarf is probably the most popular. You can read about it. It has a 577-page specification. It's a binary format. Yeah, so you have these concepts, right? So here's a simple program. These are not related examples. I just took them from that paper. But you have these tags. And there's a sub-program tag which says, oh, this is food. And there's a variable there. And the variable has a type and it has a location. And it even declares text. You can say, we're going to be dealing with ints. So the compiler will say, this is an int. Ints are size 4. I guess it's old. And it's signed, right? So you do that. You say, oh, what's C? C is a base type of an int. It has a particular location. This is the frame pointer offset. So when you find it, it's 12 items from the top of the stack. This is pretty useful, right? But this gets encoded as binary and gets embedded into the output. If you've used a traditional computing language and you've outputted the debug information and if you're using GCC, for instance, it's like dash, I don't remember what the options are. I don't want to see any more. But you specify a flag that says, I'm going to debug the output. And you get a binary that's twice the size or three times the size and has all this extra information, which you would then, on Ethereum, you'd deploy that and you'd pay three times as much and exceed the block gas limit. You don't have to parse this. But I mean, Dwarf works pretty well. There's others, too. I mentioned them here. There's PDB as Microsofts. They use it as a proprietary format. It seems to be better. It lives externally. So you get a .pdb file rather than loading your actual binary output. Cough is interesting because Cough is an actual binary output format. That's what the compiler gives you regardless. But it includes a specification that has debugging information. So you get this binary and the binary may or may not have the debugging information. Whereas Dwarf is like bolted on top. This is a separate standard. If anyone's familiar with ELF binaries, that's the Linux native binary format. That's why they got the name Dwarf. Someone was funny. All right, so here are some requirements that you come up with. Naturally, it has to live outside the bytecode because we have gas. It should be easy to generate because I don't want to make this live and leave developers' lives terrible. And it must be easy to read because I don't want Harry's life to be terrible. And it should be extensible because we don't only want to support Solidity. We also want to support Viper and all the other smart contract languages. And also, you know, E-Losam's coming. And that's going to break our debugger. So here's some requirements. There are probably others. Anyway, Harry, tell us why it's difficult. Okay, so there's a lot of problems that go into designing for these. And actually, the most important question is probably the one that Nick put last year. What does this format even look like? Now, if we take inspiration from Dwarf, now, Nick showed you, Dwarf uses this idea of declaring types. So, yeah, so like you can declare types and then have variables of those types. And that's really nice because it's like, you know, you're not limited to this preset, you know, just like a few preset types. That would be not very useful. I mean, there's the alternative approach. You know, if you can't declare types and you also don't want to limit yourself to preset types, then you could kind of stuck when it comes to like handling arrays. Or, sorry, I didn't explain that well. Because if you can't declare types and you have to associate a separate type with each variable, then you get kind of stuck when you have to hand compound things like arrays. Like are you going to associate a thing to every entry of a dynamically sized array? Like, no, you can't do that. So, this ability to declare types is quite useful and we will likely want that. Although, I don't know, I don't know much about designing a debug data format, but there's a whole sort of other challenges. And a lot of these, by the way, a lot of these points are fairly, are inspired by Solidity at the least. You know, you want to be more general than that, but these are particular challenges that come up with Solidity in particular. And therefore, I mean, if they come up with Solidity, like if we need to solve these just handle Solidity, then we're certainly going to need to solve these. So, location-dependent representation. Solidity represents data differently depending on whether it's in storage, in memory, in call data, or on the stack. Do we, you know, do we just make it so that for when you declare a type, you have to describe its representation in all locations, or maybe treat these as four different, you know, three or four different types, you know, to declare them all separately. And how do we represent the way that Solidity does dynamic arrays and mappings in storage, which has this, you know, catch-acc-based location scheme? And it's like, you really want to be putting that in your debug data format? Like, I mean, yeah, we can put a flag for it or something, but it's like, it's a little, it's a little much maybe. Nick here has this whole idea that we should make a bug data format maybe less data-based and more code-based, and you'd have these code extensions. And this maybe would be a good candidate for that. I feel like a mapping, a Solidity mapping extension, where Solidity and Kamayla would say, this is a mapping int to int or int to address or whatever, and whatever debugger builds its own implementation of what that mapping means. And you know, the Solidity compiler would maybe provide, you know, the natural language represents, like a natural language specification, but we would translate it in code and then int would be a primitive address, perhaps be a primitive, but you know, wherever there's a gap, like, how do you represent what a mapping is primitively? Well, I mean, primitive is the wrong word here, I think. Well, so maybe we should... Who here has seen or looked at the way that Solidity does mappings under the hood? I don't know if we should get into it. Yeah, well, let's not right now. We have a slide to get through. All right, all right. Yes, so then you get to relative, like, what about call data pointers? Pointers and call data are relative, but they're not relative to themselves. They're relative to the beginning of the structure that contains them. So that potentially poses some problems. How do we represent composite types? Like, presumably we need to give the people using this format a way to specify composite types and, like, how they work. And these could be structs, but they could also be things that in Solidity are primitives but that are logically speaking composites, such as, for instance, an external function pointer, which, after all, consists of an address and a selector, or an internal function pointer, which consists of two separate values in the program counter. One for the constructor bytecode and one for the deployable bytecode. It's really, really interesting how internal function pointers are in Solidity. Harry's saying it's an 8-byte value where the first byte is if... First 4 bytes. First 4 bytes are for the constructor. So, you know, you have a pointer to some function and, like, an internal function. You want to store that in a variable. It's like, maybe you have a map function that takes another function, whatever. If you need to point to that in the actual EVM bytecode, you have to point to some byte offset. But with smart contracts, you have the constructor bytecode, the code that runs when you deploy the contract, and you have the runtime bytecode, which runs when you send a transaction to the contract. And they're not the same bytecode. So if you want to point to a function, you know, you have these two separate bytecodes, you want to point to some of them. So you need to represent both of those. Yeah, okay. Anyway, moving on. Well, this second to last one I've already talked about. Where's... How do we draw the line between, you know, to what extent do we want to rely on code-based extensions versus to what extent do we want to rely on data that, you know, I'm just building that stuff into the format? Nick and I have rather differing opinions on this, but let's save that for another time. And then, oh, I skipped the first one. How do we determine, how do we represent where local variables are? Because this is not a fixed location, right? I mean, you have state variables. These live at a fixed location. Local variables, these live on the stack. It's going to depend on, you know, what's below you on the stack. How far up, like, where did this function stack frame start? Now, you know, if you're used to running, you know, things on, you know, x86 or whatever, it's like, oh, you look at the frame pointer. We don't, there's no, there's no frame pointers in Solidity. There are no frame pointers in Solidity. Yeah, in other words, maybe to try and provide a second explanation, let's say you call a function. When you call that function, it goes on to call stack as a frame. And there's a bunch of variables. Maybe your function declares A, B and C. Those three variables go on the stack. And then you have some working space for, like, maybe you do some complex expression and you have to add and then multiply and so on. And so you have a bunch of other values on the stack on top of the variables. You have to know where that frame begins. Why didn't you get the frame pointer? I'm not. I didn't know dynamic x is the stack. Oh, that's a really good point. I mean, they're thinking about adding those op codes, but they don't, yeah, those simply do not exist. And like, there's no way you could have anticipated that or like, they're like, oh yeah, we're going to wait for them to add dynamic access to the stack. Like, yeah, that's actually a really good reason that I didn't think of. Well, okay, now we're not dealing with frame pointers then. Yeah, so anyway, there's no frame pointers. The question one still stands. Also, I want to make a tag onto that. That is that Solidity's modifier feature poses additional problems, which I'm not going to go into right now because that would take us well far afield. It's nice to say that they complicate the matter. Great. So these are some of the problems we want to solve. Anyone else have anything else? I can edit the slides and come down. Anyone know? Yeah. Maybe also take an approach from the other side. Maybe not start, how do we do it? But rather, what is the information we want to have during the debugging process? Cool. Yeah, absolutely. Anything else? I also have another thing. I see you did a good job. Here we go. Amy, sorry, Chris, gone? How to deal with intermediate representation. So if you compile it through UL, do we also want to represent that somehow? So if we have a value on the stack, do we only want to know which solidly variable that represents or do we also want to know which UL variable that represents? Yeah, that's interesting. Then you did that in the same location. Like, oh, this is this variable. This is this variable, UL. That's not the point. It seems that the debugging format is very highly coupled with some of the new languages. Have you ever seen how, like, decoupled it a little bit? Yeah, well, we wanted to. We wanted to be more general than just for solidity. And part of the idea of having a debugging data format is to help provide that decoupling, right? Because right now our debugger is specifically a solidity debugger. It relies heavily on specific knowledge about solidity. We're hoping that with the debugging data format, the whatever compiler could provide a lot of this information in the format. I mean, as I mentioned, you know, I talked a lot in the other slide about solidity specific challenges. But, you know, the idea is that any challenge we have to solve for solidity is, at the very least, a problem we have to solve. Like, the problem is at least this hard. Right. I'm not trying to exclude other things. I'm just trying to list, like, the problem is at least this hard. Yeah. To your point, you can look at this. You can declare these base types. And this is how work works. Work works on any language. You can get work output on C. You can get work output on Pascal, and so on. And it kind of provides these mechanisms where you can specify that an int is size 4 for bytes. But, you know, on another platform or on another language, you might have 8 byte integers. So, like, Dwarf provides mechanisms for doing this, but it doesn't get as complicated as, say, solidity storage arrays versus solidity memory arrays. Like, structs in solidity and memory are, which one, one's tightly packed, one's not tightly packed, storage first? Oh, storage is tightly packed. Yeah, so storage arrays are tightly packed, but memory arrays are not tightly packed, and so how do you represent that with something like this? Perhaps it is unclear. I mean, that's not even, I mean, I'm not even going to say that. I'm not sure I'd, here's a question for you, is do you represent that as a property of arrays? I probably would have represented it as a property of the underlying types. Like, I probably would have said, oh, yeah, in memory, int date is 32 bytes, even though it's called an int date, and to storage an int date is 8 bytes. But, you know, both of those are viable representations. So, you need to figure out which one to go with, because there's other differences between memory arrays and storage arrays. Like, I mean, I'd say the bigger difference is how they handle reference types. But, neat stuff. All right, so, there's some links for you. Harry wrote a paper that documents all of Solidity's data representation. Yeah, I mean, those of you who work on Solidity, I'm just recapitulating all the work you did and putting it in a form that I hope is readable, but... It has tables. They're good tables. They are. So, Truffle is about to release a library called Truffle codec, which handles encoding and decoding of values, not only Solidity values, but also ABI values. And you can see... I'm just trying to show you what we're dealing with. We have this representation of all values in Solidity. This is the output format that Truffle codec will be providing. And these are JavaScript objects, and you can see we modeled the types. So, each result will have a type field and a value field. And the type field represents one of these, you know, generally type. Yeah, and I should note that this particular thing is very Solidity-specific. So, don't look to this too much necessarily for the debugging data format, because that we really do want to be more general. Right, but this is a start, right? Because you can say we have a struct type. Oh, let's just do... Like a dynamic array, right? Dynamic array has a base type, which is another type, so that might be an integer. And it has a location, which might be memory storage, call data out of the stack. I guess arrays don't all stack. It can't call on stack. But yeah, so those are types, and then we also have values, which represent the corresponding value. So, you know, a pair of type and value gives you a representation of your source variable or whatever they are trying to decode. So, maybe, at least I think our hope is that this kind of approach will lend itself to a more general solution. But currently, this is not serializable, because data... Values in Solidity can be circular, and we can't really serialize that to JSON very easily. So, more work to be done for sure. And, of course, shout out to these Solidity docs for being excellent. Actually, the circularity can kind of have a wave. Yeah, well, we have ideas for that. Anyway, that's not relevant. Circular stuff is not particularly relevant right now. All right. So, the hope here is we can break into groups and kind of talk about a high-level structure, like this look like. Give us 20 minutes for that, unless everyone gets bored. But, you know, compare with Dwarf. Maybe we can make a JSON at least. Maybe we can get the spec under 400 pages. So, in Dwarf, the top-level concept, they call it a DIE, a debugging information entry. So, that's the thing with the tag. Each DIE has a tag, and then each tag has a number of attributes. And so, I don't know, maybe a start here will be to figure out how we can do this in JSON. We have scratch paper. I don't know. Talk amongst yourselves. There's a lot of people here. Yeah, I was expecting tables. But, now we just have chairs. All right. I'll give everyone... At least some people are talking. Anyone want to design a data format today? That's perfect. Oh, yeah, sure. All right. Maybe just call out ideas. What's the top-level? Let's stick with JSON. Is it an array or an object? How do you know that in advance? That's terrible. I'm just... All right.