 It looks like we're live. People can't see you because I haven't moved it around. But it looks like that worked. Hello, folks. If you're watching this after the fact, sit tight. We've had some technical difficulties. We're getting going. And so check the note stock for time codes below to know when we actually get everything going. Looks like we're working. Thank you to DCD David for doing time codes. I saw that you're here. I really appreciate it every week. And let me just pause this. And whoof. So I just recompiled OBS. Yay, we got some yeas in the YouTube chat. And folks can hear you, Jim, but they cannot see you because I'm on the wrong screen. Let's see. Hello, everybody. Now people can see you. Let's see if I can't make this a little bigger. All right. OK, so let me do housekeeping. I think we're as good as it's going to get. If we need to see stuff later, we can tweak things. So let's do housekeeping. All right, so hello, everyone. You can see me up here. That's right. My name is Scott. I work on CircuitPython for Adafruit. This is a deep dive. Oh, if you don't know, CircuitPython is a version of Python designed for microcontrollers, which are little inexpensive computers that are really easy to get going, programming generally, but also interacting with the real world. So that's what they excel at. Adafruit is an open-source hardware and software company based out of New York. They pay for me to work on CircuitPython, and they pay for me to do these streams. So if you want to support me, support them by going to Adafruit.com, purchasing some hardware there. If you want to chat with me and a lot of others, join our Adafruit Discord server by going to the URL adafru.it-slash-discord. We're there all week. That's not just during the streams. That's what makes Discord so awesome. This is a deep dive. They happen every week at 2 PM Pacific if things start on time and up like they should. But this week, we're going about 20 minutes late. So oops. Thank you to Jim for getting up early and then waiting for me to get everything working. Normally happens Fridays at 2 PM Pacific. That's 7 AM for the Australians in the crowd. Generally, they're on Fridays. We do occasionally shift it to Thursdays. We typically go for two hours or more. We should be able to do that this week as well if we've got enough to talk about. Questions are welcome, although we're going to get started with Jim here since he's been patiently waiting. But we'll answer questions as they come up. And then next week should be on Friday as well. I looked at that. And last up, the cat who you can't see is epileptic. So just be aware that if I'm not talking and watching something that's because I'm making sure he's OK and there's roofers next door. So they're making noise as well. All right. Well, that's housekeeping. We're going again, which is great. So who is this on the screen, Jim? Why don't you introduce yourself a little bit first? Let's start with the basics before we get super in the weeds. All right. Well, hi, Scott and hi, everyone. Thanks for having me on your drive. I work on micro-hythen. So micro-hythen, I'm sure everyone here is familiar with micro-hythen and micro-hythen. And I've been involved for a few years after coming from an education teaching background in with the micro-hythen. And I fell in love with micro-hythen on the micro-hythen and had a lot of fun and kids excited about computers and electronics. Awesome. I was wondering how you got into micro-hythen. Yeah, it's been a lot of fun, actually. And over time, got more involved and eventually met Damien at PyCon. And now I do this. This is my job. I spend my, well, yeah, when I'm not studying, I work on micro-hythen and do everything from them. So you're paid by Georgia Robotics, right? So if you want to support Jim and Damien, you can buy a PyBoard, PyBoard D. That's a great way. And they also do contract work. So if you're looking to put it in a product, that's a way to do it. And GitHub's sponsorship, which I did recently as well, which is really awesome. Awesome. Yeah, so I highly encourage folks to support them as well. They've done so much great work and continue to do great work. And CircuitPython benefits from the MicroPython work. So thank you, Jim, for working on that. And thanks to Damien and all of the other MicroPython contributors as well. Thank you. Yeah, so I think that is a good introduction. So you reached out. So we had a discussion with us. And I said, you know, Damien, Jim, if you ever want to come on our deep dive and talk about something, let me know. You're welcome. So what made you email me and be like, hey, I've got this thing that could be fun to talk about? Well, I think I'm very excited about your recent work to merge up to 116 to CircuitPython. It is a really good opportunity to bring the two folks even closer together. And in the day, much of the code base is the same. And our bugs are your bugs. And yeah, I thought it sounded interesting. I like learning and teaching. And certainly, I've spent a lot of time banging my head against MicroPython and trying to figure out, like any code base that's large and complicated, trying to get my head around how things work. And we had an issue that was reported, a compilation issue. And those are pretty serious, obviously, because they lead to incorrect execution of code. And yeah, I thought it might be the sort of thing that you'd like to dive into and in the process. I'm happy to take this wherever people want. I'll full disclosure up front. I have our security diagnosed and fixed this bug because watching me struggle against this live would have been terrible. But if people have questions. If people have questions. So we do actually have a tangential question from a couple of folks. What is that black box with the blinky lights on the wall? What is the black box with the blinky lights? Under the bookshelf. That one? Yeah, that one. That is my CPU. So when I was teaching, I used to want to teach how computers worked and demystify electronics. And you can start with AND gates, and you can finish with MicroPython. But there's this big jump in between. How does logic happen? Right, right. And so I used to teach a little course on four-bit microcontrollers that you could build out of and basically 7400 series logic. And it wasn't quite enough. And then I discovered my two favorite websites on the internet. One is Megaprocessor, which is the guy that built out of BGT transistors, an entire CPU. If you haven't seen it, megaprocessor.net, I think is the website. Megaprocessor.com. And there's this great line in it. Because you can imagine it's enormous. And it's like, at this point, the battle between my living room and the Megaprocessor begins to escalate. So obviously too big. And then at the other end of the spectrum, there's the Monster 6502, which is somebody built a 6502 processor out of Service Mount FETS. And the cool thing that both of these projects have in common is that every single gate has an LED on it. And you can single step the processor and watch every single gate change date. And so that's what that is. Except rather than being out of individual MOSFETS, it's small logic units. So one logic unit is a register. So eight flip flops or whatever. And anyway, so I'll be afraid of that. I was going to say, so two questions I have follow-ups. Yeah, it doesn't run microprethen. That was the first one. The second one was, how long does it take to compile microprethen? So right now it's running. I leave it on all the time because it's one of those projects that gives me a huge round of satisfaction that it's just done. I'm really happy with it. And it's in the background competing power numbers. And I don't know. It's 100 and I can't really quite see from here, but what is that? The clock speed is artificially really low. And so it's 157 at the moment. I see. But yeah, you can speak about one. Anyway. That's awesome. But what it can do is drive dot stars. So it's got memory map.io and then after BitBang, it's five. So that's quite fun anyway. So well done. Nah, this is perfect time. OK, and props to Mr. Who30 and Mark Olson who asked that question. I usually do say hi to folks who are in the chat, but we've been trying to get things going. All right, so let me get some windows arranged so that we can have on the desktop so we can follow along. So if folks have questions, feel free to ask them. I'm going to create a new scene here and actually just actually call it desktop with guest 2. So I know you're seeing a black screen. Don't freak out. I will add me desktop and capture. There, now we can see Jim. So this is going to be a bit of a challenge that I only had so much screen space. You know what? We can put Jim beneath what we're looking at. Then I can do window capture. What I'm trying to do is get it so that folks can see both of us and my desktop. I think that would be ideal. Because I assume we're going to want to talk about the code that we're looking at. Yeah, should I be sharing my screen on? I think I'll do it. I don't want to. I already had too much technical difficulties here. So I think you'll have to show me what you want. OK, have you got the Unix port set up and ready to go? No, I do not. Should I have them? Yeah, we'll need the Unix port. Well, can we give it a go and see how it works? Otherwise, you could try screen sharing. We could set that up if you'd like to drive. People are putting up with us to change the clip. Oh, you know, that might be. How's that, folks? Is that too small? You might want to make things bigger. This actually saves me. Yeah, and we can only see, oh yeah, that's your terminal. All right, can folks see Jim's screen? Enhance. Or enhance. Can't see the file list. I don't remember the file list. That's fine. Let's just take you to some space. All right, I think that's as good as we're going to get. Does the drop saw run microphone? No, but the laser kind of runs the X. All right, take us away, Jim. OK, cool. So this was based on a bug that was reported about a week ago. And this is exactly the repro that was given to us. And I guess what it's doing is it's turning a single integer into four bytes and pulling out individual bits in that integer to set the four bytes. And so I should be able to run this on a 7-py board plugged in. Use 7-py. And to be corrected, before bytes, I get a 255-0-0-0-0, because the input was 0. So these three conditions should be 0. So no surprises there. That's good. So MicroPython has this feature. So MicroPython, as well, Python.native. And what this does is it tells the compiler running on the board, other than generating bytecode for this function, generate native code for whatever the architecture is, which in this case is ARM. And it's supposed to be, you just don't even have to think about it. You just turn it on and everything is the same. The native decorator just compiles it to native code and no features are missing. There is a Viper emitter, which we can talk about later, but the only reason you would not native everything. And in fact, in the very early days of MicroPython, before it was released publicly, Damian's goal he told me was that everything would always use the native emitter for performance. So MicroPython wouldn't have a virtual machine. It would just generate native code for everything. The problem you have is that native code uses a lot more. Because MicroPython has to execute its code from RAM, so it compiles, generates the RAM, it takes up a lot more RAM for the same functionality. So this same function would be a lot bigger in native code. Sorry, I'm going to pause. Could you make your terminal bigger? Yeah, of course. Your text bigger there. Folks are having a little trouble seeing that. That looks better. I can see it better than other people, too. OK, so that's interesting. So I was going to ask you, what is the reason that people would use native, the native emitter? Yeah, entirely performance. If you can afford the RAM for a couple of small functions. And the RAM overhead is about double, I think, from memory. And it varies a little bit depending on the function. And the reason for this extra overhead is that in MicroPython, like Cpython in byte code, you can express quite a lot in one opcode. So an ad on the top two, things on the stack, whatever. And whereas expressing that in native code, you have to actually emit any potential checks, any literally everything the CPS has to do to implement that one Python high-level instruction has to be generated as potentially many, many bytes of machine code. And you think about something really simple in Python, like A plus B, you've got to figure out the type of A, you've got to figure out the type of, well, bottom. You've got to figure out which method to call and then things like that. And the details are a bit complicated, and there's a lot of optimizations. But when your competition is a one byte opcode, it's pretty tough. So a long time ago, MicroPython switched to generating byte code by default. And you move the cost into Flash because all of the code full of EM is in Flash, which you generally have more of. And you put a lot less in RAM. And this is the benefit of freezing your Python code is because when you freeze it, the code goes in Flash as well. So if you're freezing, it's worth considering using native because it would be a performance boost. So I just wanted to point out for those folks who are not classically trained computer science folks, is this is a classic trade-off. This is a classic trade-off of space versus execution speed. And so that's not surprising to me at all. Yeah. Yeah. Is a surprisingly large amount of things like that in MicroPython? And I guess that's true. I guess that's true in any software project. It's always finding the right point in trade-off space. And Scott and I have talked about this a lot before, but there's lots of features that we would love to provide more knobs to the user to decide what their code does. But we have to maintain Python compatibility. Things like the native decorator obviously are about as close as you can get to keeping Python syntax while still giving a few hints to the compiler. One thing that I did do pretty early on when I was working in MicroPython is I really pushed Damian to require a MicroPython import. Yeah. First off like this, which I think you can do. I think you could say import MicroPython, and then it will work. But that was something that I was a little nitpicky on. And the reason that I was nitpicky on that for folks is that if you took this code without the import for MicroPython and tried to run it in Cpython, you would get an error because MicroPython was not defined. Yeah, exactly. And you could just write MicroPython.py that had def-definative. Whatever the error returns over, yeah. Returns def, whatever. And then if you had that in your path, whatever, then this would just run on Cpython as well. Yeah, exactly. Right. Yeah, and where we see that most is if folks have ever seen the const thing in CircuitPython or MicroPython code, const is the same idea as that. MicroPython knows to just use it on its own when it sees it. But we kind of enforce in CircuitPython libraries that you need to say from MicroPython import const. And that allows us to have a const thing in Blinka that just passes the value through, just like Jim did for native as well. And so for people who are interested, this example code wouldn't get any benefit from this. But if I had a program that I had some variables, I'm using lots of places, I can do these two different versions of the const keyword with and without the leading underscore. And what that means is that anywhere I use these variables A and B, it's not really a global variable. I'm not having to do a look up into the global sticks. It literally is as if I typed 20, like I have with these other literals. And then the B, the underscore version, has this extra thing where it no longer even becomes part of, well as avoiding being a global lookup when I use it here, it also doesn't even add it to global. So I can't even use it from outside the module. So it literally has your RAMCosta at all. Right, and that's what we found is we tend to only use const for the things that are, because I think people don't always realize that there's a RAMCosta variable names. Like the variable names are stored in RAM for all the stuff. Only global variables, and actually that's something we're gonna talk about, because local variables, the variable names don't exist, only local variables. Right, okay, so what the native decorator is, so for those new to Python, the at symbol before a function is called the decorator, and functionally the way it works is that that thing gets called with the function passed to it, right? If you were thinking of classic C Python. Now, MicroPython is playing tricks here when it sees that, but that's generally like why it's valid Python. So the native emitter, native emitter emits thumb code, it's larger RAM, it takes more RAM than the VM bytecode, but it runs faster. Do we do much optimization with that thumb code? Because I had somebody say, oh, that's like GCC or something, I was like, well, it's not quite a compiler. It's way, way, way faster because obviously the VM has these huge costs, the VM is just a, I'll loop with a bunch of statements and all these branches and stuff like that. But it is really fast because it linearizes the code, I guess, but no, there's very little optimizations. But I guess that's a good time to talk about it. So we also have this thing called Viper. Well, wait, we have a question here. Let's keep up with it. So the question is what's thumb code? Yeah, sure. I'm probably gonna mess this up, but so we're talking about generally native code for the target architecture and on our PCs on x86, that would be x86 machine code or more recently, 64. On ARM, historically, ARM has two sort of similar instruction sets, ARM and thumb. And thumb is what we run on Cortex series microcontrollers. I will not even attempt to go into the distinction on those, but when I say thumb, then thumb is the name for native code on ARM. Yeah, so I wanna tie in two things here. So thumb is an instruction set architecture, which is usually shortened to ISA. And the way that I think about it for folks who don't quite understand what this is, is it's really the API between software and the CPU. So it's the what are the bits that the CPU load to decide how to execute? That's kind of what I think of as the ISA. And then the other thing I wanted to tie to is folks have probably heard of risk five. And people say, do you support risk five? Do you support risk five? And risk five itself at its core is just the definition of what the ISA is. It has no bearing on the actual chip, the actual CPU, like a Cortex M0, for example, right? The ISA is like this API that anybody can implement, whether it's open source or not open source. And then the compilers then convert like human readable C code or whatever code down to that like machine level instruction sets. So I just wanted to take that opportunity to talk because I hear about risk five stuff all the time. We support risk five, but like risk five is not the thing that's hard for us to support. It's like all of the peripherals that end up in a chip that has a risk five CPU. It's a great point, yeah. Yeah, so I thought I'd just point that out since we're talking about thumb. And one characteristic about thumb and risk five has an extension for this too is code size. So like when you take some sample C code, how big is the data that they lack actual instructions are? And thumb too is like, is the second revision of thumb and it has a lot of 16 bit instructions, which means that it like is much smaller generally. And you can find comparisons about like the compressed instruction set for risk five and see how like how different code size matters between those two things. Yeah, and it's funny because like R in arm originally was risk as well. And risk here is the reduced instructions that you don't know whatever it is. And but these days when you look at arm and how it looked at the arm and things, it's not really very reduced anymore. I mean, we talked about CPU on the wall earlier that has I think, I don't know, it's like 10 instructions or something like that. But nowadays on arm, you can do things like add this with a dereference and an offset and, because it's got this barrel shifter that can do all these fancy calculations inside an instruction. And yeah, it all gets pretty complicated, right? I will link after this, there's actually, if anyone's interested in intro to thumb, Raymond Chen on the Old New Thing blog has just recently done a like 25 part series on thumb and it's amazing as everything he writes is. I should read to that. And I also wanna, so risk is reduced instruction set and the corollary is C complex. Is it complex or? So X86 is generally known as a CISC, like not reduced in the idea, like as Jim started talking about as like complex or instruction sets like have these very long instructions that can do a lot all at once potentially. But I think the world's going towards reduced instruction set stuff. And if you read about the design of the like M1, the arm chip and M1, one thing that they really do a lot is like, if you have fixed length instructions, it means that you can look ahead really, really far into like the things that you're going to potentially do in the future. And the advantage of that is like, if you get into computer design, which were in these weeds, which is great, that's what D-Di is there for, is that the challenge with any CPU is generally that your memory's a lot, lot slower. So it's really beneficial for chips like the M1 where they can look, I think it's like 128 instructions ahead or something and they can initiate all of the RAM lookups that may need to happen so that just in case they need that data, it's closer and it's already on its way. Versus like, versus the like x86 model where they can't necessarily look that far into the future because they don't exactly know where their instructions are. Yeah, bigger and bigger hacks to try and figure out what the dependencies are and stuff like that. So when people talk about a CPU having deep pipelining, that's what they're talking about. It's that the pipeline is really long because they can read. Right. Well, sort of, there's a lot of concepts here. And yeah, but we're in the weeds because another related thing. I found my heart. This is amazing. No, it's great. I mean, that's what people come for. The other thing to, if folks have heard about the security issues in CPU designs, it has to do with this speculative stuff that the CPU is doing. So the idea that you can go into a branch of code and it will say like, oh, you know, I might need this memory in the future. Therefore, I'm gonna go fetch it. But then like, you didn't actually need it because you weren't allowed to fetch it. But then if you can fetch it in another way, you can see that like it was cached. So there's a whole like, yeah, meltdown. There's all these sorts of CPU vulnerabilities that are about like, things that CPUs do ahead of time, just in case they might need to do it. And then making sure that like, that doesn't impact like code going forwards is a huge security thing that a lot of CPUs have really been having to deal with, which is interesting, but not something that we need to deal with in a MicroPython line. Yeah, so there are two things that I think are a bit of worth bringing up there though is that you have hit on one of my secret hidden agendas for this deep dive, which is that it would be really cool one day to get a native emitter for risk five for MicroPython. In making awareness of how the native emitter works, I hope somebody might get excited and we'll have a little look at how you write a native emitter for MicroPython. And so risk five support would be really cool. And there is one risk five chip that we're gonna see a little bit. MicroPython effectively is already on the way to supporting, which is, and so good by the two eyes, and is the ESP32S, is it the C3, I think? Yeah, which is at its core risk five rather than the extensor of the previous generations. And like you say, it's about supporting the peripherals and so the IDF takes care of that for us. Right. Yeah, I mean, the main thing for us is that it doesn't have native USB, so we would have to get BLE support through the IDF and then we would be able to do those as BLE-only supports, which is something that we do want to do, but we're in the weeds of BLE right now as folks have watched the stream. No. I sympathize with these guys. That's what I'm trying on the other side as well. Well, some of the weeds that we're in actually now though is like, I talked to you about this file transfer protocol for BLE and we actually have the challenge now is it's not so much the device side, like there's bugs on the device, but the main challenge for us right now that we're in is actually having apps and code that support it on the host side. So that's really like where we're trying to push right now is like, I was doing web Bluetooth stuff and I was very pleased that like this protocol does not use L2CAP stuff because like that's not available through web Bluetooth. So I think we're on the right track with this file transfer protocol stuff over BLE, but like there's just, we just don't have the apps right now to make it worthwhile and a good experience. So that's kind of where we're pushing right now. And then like we'll cycle back to the device side once we can actually like get people testing it and finding the issues that we have. Yeah. I'm going to resist the temptation to follow that side quest and talk about L2CAP channels. Yeah. Well, I did hear from a couple of the Chrome people. So I'm looking forward to the Chrome web Bluetooth support improving and hopefully actually getting them using CircuitPython to test with, which would be amazing because they're working on supporting secure characteristics in Chrome OS and Windows, which they don't do currently. But that I've set CircuitPython up to use because imagine you have a room full of people and like blah, blah, blah, so. Uh, don't use that. You keep tempting me with Bluetooth side quest. Okay, let's pop the stack. Let's pop the stack. We'll do some stack popping later as well, cool. Okay, so the thrilling conclusion is that I had my friend that native and I expected to be exactly the same based on everything I said before, but unfortunately it isn't. And we see this second byte is 255, which is clearly wrong because. You passed 255, I passed in a zero, zero and one is zero. I should hit the L zero case. Right. And that's a worry. And in this case, we're running on, so NPR is MP remote. It's a new tool for MicroPython. 115, right? It was released with 115 or 116 is out. 116. Yeah, so NPR is, what's your equivalent? It loads a file into RAM without touching the file system and then executes the file out of RAM. It also lets you do this quite neat thing, which is you can go NPR mount home slash empire. And now I'm on the Rappel on the device. If I go, and I can go import the 7523 and what it's actually done is made my host PCs home slash empire directory, the file system, like it's mounted into the VFS on the board. Right. So this is really neat for this sort of debugging because if I just reboot the software, it's at the board. And I could have edited this file and I'll change it to a one, import these in five, two, three. Well, that's actually the right answer now. But anyway, yeah, they do. Right, so this is kind of, so if I had to summarize what the advantage of this is, is that this is kind of the circuit Python workflow, but it's over serial only. So if you're using like an ESP32, for example, which we don't support in circuit Python at all, like you can have this standard workflow that has file system stuff across devices, even if they don't have native USB. Exactly, yeah. And with very precise control about when the reboots happen and stuff like that, but and no contention on also some access because there is only one file system. It's the host file system. Interesting. So you're like completely losing access to the native one. I mean, it's still there, but I mean, I haven't disabled it, but yeah, yeah, I mean, because this is an SDM32, but it has that. But yeah, we almost never use that. Interesting. Okay, so it's failing on, do we want to talk about Viper quickly? Do whatever you like. We got tons of time. Okay. I mean, we started 20 minutes late. We've got at least an hour, so. So we've also got Viper. Okay. Viper is also failing here. It's giving the wrong output. Okay. And so what Viper does is it says, first of all, does the native emitter do any optimization? And it doesn't. It just writes the native code that would execute the same thing that the VM would have done. What Viper does instead is say, well, I know a little bit about this code. For example, I know that Packet is an integer. So when I call, say for example, and or add or any operations that involve integers, don't go through the normal Python implement, you know, the dunder dunder ad or whatever. So it is an integer, emit the code as if it were that and not really a Python object. And this enables a ton of optimizations if your code is doing a bunch of number crunching. And it can be hard to get right because you have to tell the compiler exactly what all the types are. And, you know, it can only go so far with its inference and whatever. It falls back to native. So like if you do regular object, you've got a sort of things, but when it knows what the types are based on the argumentations, it can be really, really fast. So, you know, you might get a, you know, another two X performance boost from Viper. I didn't even realize, okay, so Viper is actually like super native. Like it's better than regular native. I didn't even realize we had any sort of logic in MicroPython four based on type hints. I didn't, I didn't know that. That's the only place the native emitter, yeah. Okay, but we can put type hints basically everywhere and we just ignore them otherwise, right? I forgot, definitely in function arguments. And I can't remember what the state of type hints everywhere else is. I think there's possibly an outstanding PR to do that. But again, it's in that case, it's an extra code size for a feature that does nothing. But on the other hand, it lets you run unmodified Python codes. So it is a good feature. Which is where my interest comes from is like, I think the killer feature of type hints is actually just IDE help. Like actually it's just like making editors smarter about what things are. So if we can have like type designations in our libraries, like that would be really beneficial. Yeah, absolutely, absolutely. So before you get going in, M.Causeer has a question for you. Yeah, yeah, I just saw that. This is about the consts. Let me, yeah, let me read it. In case we have, sometimes we have folks that are only listening and not watching. So I'll read it off. It says, if you have repeating numbers like 255s and zeroes, is it a bad idea to make them consts? Are they already treated like consts internally? Yeah, so the question is basically if I, in this code on the screen, is there any benefit to me doing all ones equals const 255 and then changing these 255s? And the answer is no, because what const is doing is actually injecting the 255s into, it's as if I had written wherever I use all ones, it's as if I had written 255. Right. And so why this is important is if instead what I had done is I had a variable that wasn't const and done this, the generated code for this would have to look up all ones in the global stick. And so there's a few steps here. You've got to load a queue strut. So the name, for the name of this variable, you've got to look it up and do it to a dictionary lookup and then put on the stack and stuff like that. So that can be really slow. Whereas if I write 255, the generated bytecode which really has load fast 255, which all it does is like one or two byte instruction to push the value 255 once to byte instructions to push the value 255 directly onto the stack. And so const is just like if you see programmer, const is like hash define in C. It's a substitution in the code, like a literal substitution. Right. If you don't have the underscore in front of the name though, you will pay a RAM price for the name itself. Yeah, we still put it in the global stick. So yeah, what you can do is you can write, if you put that in the screen, all ones, you can use that in your code, but you could still import this file and still go that file to all ones. So it kind of gives you a hybrid of both, which can be useful. Cool. So we have a native decorator on this function. That was the wrong thing. So the first thing I always do at this point is I... So is there a way to disassemble the native output? Exactly. Exactly. And that's where this next fork we're gonna take. And depending on which way this fork goes, it will greatly change the outcome of your day. So... Well, it's just about weekend for me, so. Fortunately, I know where this goes. So on MicroPython, sorry, I'm not as familiar with the server Python, but on MicroPython, we have a port that runs on Linux, the Unix port, and it effectively is a bit like a C Python replacement. So we still maintain the Unix port because we get all of your really good tests. Yeah, of course. Very important for running. So that's the one that we do not get rid of. Okay. So, sorry, I should have built this before, but... You called Blaze. So for folks who don't know, Blaze is like the internal Google name for making stuff. Jim and I are both former Googlers, so. I worked there for a long time and it was burned into my muscle memory that that's how I built things. That's hilarious. And what else to call my alias, but in parallel. The open source version is called Basil for those who followed along. And a Google running for Blaze is like make-j6000, so it's like, it's pretty awesome. Okay, so now I have a MicroPython binary for Unix and now I can run that with that same B7, whatever it is. And then we hope, yes, same failure. Yay. Now I don't have to debug it on the board. And if the failure had only been on the board, then I'd know there's a problem specific to the arm emitter. Whereas this tells me this problem is common to all them, well, at least x86 or x64, but more importantly, I can just debug GDB natively rather than wiring up Nesti link or whatever. So doctor is asking that dash j is how many threads you're using to build, right? Yeah, it's like the number of concurrent things that make has to do that it will do at the same time. Yeah. And generally you want it for thread. Yeah, these days when you have, you know, nice modern fancy CPUs, you'll literally make your builds 16 times faster. Yeah. If anyone knows if there's an environment variable that you can set that just makes that the default, then I would love to know that. I know Ninja does it by default. Yeah. Ninja figures it out. And if you use dash j on its own, I think it makes a sensible guess. So this is a useful thing to turn off that it's just to confirm that does the right thing without native, which should be good to know, which it does. So one of the unique spot does it's really quite neat is you can run with the verbose flag. And I think it's dash VVV and it tells you what it, what it, you asked about the disassembly and stuff like that. And if there's what's going on. So I'll run through this quickly because it's understanding this is pretty important first understanding how the native image is going to work. And so we've run the compilation on this file. And there are kind of two, there are two functions here really, which is one is there's the module itself because at the end of the day running a Python file is running all the lines in that file. And then each function itself is then another, obviously another function. And so what we're seeing here is a code block for the module and this is the generated bytecode. So this is the bytecode that tells it how to import MicroPython and run print unpack zero. And there's a lot of extra, there's a little bit of a preamble here. So that might seem like a lot of bytecode, but there's a preamble here, which is like defining the module and all that stuff as well. And yeah, so this is MicroPython bytecode, which is different to Cpython's bytecode, what it would store in a PYC file, but has a few things that are similar, but it's just a little bit more optimized for space rather than other things. And MicroPython's virtual machine is stack-based. And actually we don't see a lot of, we see some stacks up here. So we do an port of the name MicroPython and then we assign that to the global variable called MicroPython. So this store name is a set of a global variable. We're gonna make a function that is a pointer that we're gonna reference later. We're gonna push the name, sorry, store the name unpack. So unpack, these two lines here are make a function, push the result on the stack, then store the current top of the stack into a global variable named unpack. And if we looked at the actual bytes here, what this would be would be, it would be the opcode for store name followed by the custer ID, so the in turn string ID of the string unpack, which would be built into our, let's not go down those weeds, but the point I wanna make here is that the bytecode doesn't store the literal bytes UNPACK, it actually puts that into a table of unique strings and then we just reference each string by its ID, so it'll be a two byte ID instead. Then we load from globals the name print, so we look up print in globals. So this is kind of equivalent to a dictionary lookup, but specifically in globals because we're gonna do a print statement and then we load the name unpack. And then we load a const small int zero, so that's the zero, so that's push the literal value zero onto the stack and this kind of ties into what we talked about before because it was a literal zero, that the value is encoded into the opcode. And then we call the function the stack with this many arguments. So we expect the top of the stack to have a function and one argument, which we then execute. And then we call the function that's now on the top of the stack, which was print that we loaded earlier. It has one argument, which was the result of this previous one. We don't care about the result of print because we're not doing anything with it. Print's returning on in this case, but we do nothing. And then the module itself is a function that returns none. So push none, turn the top of the stack. So one question is, how wide is the stack? Does that make? Yeah, we are gonna talk about that later. Okay, there are two ways of the stack being implemented, right? There's like intermingled in the C stack and what we've turned on, which is we have the Pi stack enabled now. Okay, yeah, okay. Yeah, we're gonna talk about that too. Perfect. But yeah, there's actually three, there's three stacks that we're gonna have to think about. And then this is our actual function, the unpack function. And it's the bytes function from built-ins, from global. Load, load, load, load, load. And now we see this load fast and load fast is a load of a local variable. So I said that we throw away the names of all the locals and the locals just become, we figure out how many local variables we need and there are just n of them in the function. And so this is the zeroth one. And it's had to generate a few temporary ones. But we see, we load a one and we end it. So load fast zero in this case will be the packed argument to this function. So your locals are effectively your function arguments and any local variables you create after that point. So we've pushed the zeroth argument onto the stack. We've pushed one onto the stack and then we call the and function. It's a binary op instruction says, execute a binary op and I guess there's a good time to look at it as any runtime.cmp binary op. And so the binary op code is telling the VM call np binary op if the left hand side and the right hand side. And in this case, the left hand side will be the top two values in the stack which will be a zero local variable. And so the zeroth local variable and one the literal. Right. And so then the and is encoded as an ID. So and might be the eddy three for binary op three or whatever I don't know what it is. Is that what the 24 is? It could be. 24 is the np binary op and or whatever it. Quite probable. I could check that. It's probably on the in-screen. Yeah, it almost certainly is. So for those, np binary opt is usually also implemented on the like a per object basis. So you can have like different, different things happen depending on like what your left hand side is. Just might actually be where you're going as in object int. Yeah, exactly. And so it's a np binary op then has to do all the logic of Python which is what is the op? What are the types? So there'll be a special case here which is that if both of these types are integers, then we don't need to call, we don't need to do any special free integers. We can literally call, you know, you can actually see the bind with the bitwise operators here doing that thing. But if they're not that, then you'll see at the bottom, we need to then go and find out what this type, what the types are, so the left hand side has a type. That type might define its own binary op and it might do so because it's a built-in type or it might do so because type literally defines under under and for example, and this would, if the type were instance type, then types, then instant types binary op knows to look at the Python code and figure out if it has an hand. And this ties into the native emitter because the native emitter doesn't know anything about any of this. All it does is knows how to emit machine code that can call the same binary op function. That's where Viper comes in is that Viper says, but if the types were integers, that will emit machine code for integers. I see. So this is a pretty nasty little function to look at the disassembly for because it's got a couple of nested expressions and a bit repeated. But probably the most useful thing that we can do here is to reduce the arrow down to the simplest possible thing. So like, for example, does it happen if we get rid of all these extra bits? So a question from doctor is, what is an emitter? And yeah, cool. So an emitter, the stages go, you take the input source and you lex it to turn it into a series of tokens. So, you know, this becomes print open parent. So identifier open parent, identifier open parent integer. That goes into the parser that builds a structure out of that. So, you know, first argument to print is this expression and this expression is, you know, and then once you've got the parse tree, this is takes the parse tree. And that's the stage that will tell you whether you have a syntax error. Exactly. So lex it will be like, I can't even figure out, you know, if I, if I, the lex would have an error. For example, if you didn't terminate a string or whatever, or if you, I like completely invalid character. Yeah, I'm trying to think of a good example of that. But the, I think I'm pretty sure like, if I wrote like triple ended, like that many had the sentence, the lex would fail because I wouldn't know what token that that should be, for example. Right. The parser would say, no, wait. I was about to say trailing comma, but is that valid? I can't remember. I think it is. That's an invalid expression in Python. And so the parser, like there's no right-hand side to this plus operator. Right. And then once you got a parse tree, you walk the tree and you emit code for the parse tree. And that's where you either emit the bytecode. And so the example we've been looking at here is emitting the Python bytecode for these functions. But we also have these two other options, which is we can emit native code or we can emit Viper code. And so the native and Viper stuff does not kind of go through the Python bytecode first. No, it comes straight out. That's right. But the relationship between the Python bytecode and the API between the parser and the emitter is fairly close. It's not quite the same, but there's a lot of similarities. So, and let me, we should have to take a look at that. So, let me see. So this is the bytecode emitter, emit vc.c. And so for example, emit vc.binaryop. So what this means is the parser is found a place where it needs a binaryop to happen. Right. And so it will write out the bytecode for binaryopmulti with this particular binaryop. And so actually we can look up if that 24 was, I'm guessing that that's, is that place not as in place and, and dirac and dirac and that's nine plus 13, 21 plus three. Yep, that's 24. So that was that for Miller. Right, right. And, and yeah, and so then we call emit write. So this, the parser has called emit vc.binaryop. And so in the problem is that this happens through some preprocessed stuff that there were, there's no call to this because what's really happening is par, is compile.c. So compile is the thing that takes the parse tree and calls the emitter and so there, for example, is a in-place add that emit arg is a versus a macro that will ultimately call binaryop like that. Yeah. And what it comes down to is it'll figure out what the current emitter is but emit arg will figure out what the emitter is and call the correct method on the current emitter which will end up here. And so there'll be an equivalent in emit native. There'll be emit, emit native binaryop. Right. And what this is gonna do is, I'll skip this statement if I'm back to that. It's a lot longer. Is this case, this is the common case. And it's saying tell the, this is the emnative emitter. Tell the current architecture to pop, to emit the preamble popping register register. Come back to that. And depending on the op, we emit a call and this will literally, in whatever the current architecture is, emit the machine code that will execute the MP binaryop function that we were just looking at before. And it's gonna do it with, so emit call with emit arg. So it's call to some function. I want you to call the binaryop function, particular op, which was 24 from before. And I want to, I forget it'll, it'll, it assumes, I think, I can't remember which side is the left-hand side or the right-hand side. And if it was the not operator, if it was the not version of an operator, then we invert it to call the unaryop function. But importantly, we're generating native code that is just calling back into our runtime, that knows how to deal with these objects. And all of the handling of what type of object is, this is still happening in the regular path that the byte code would have gone by. And similarly in VM.C, this is the case for, so this is this entry, it's like a switch statement case. And so it's the current op is a binaryop. I pop the two things off the top of the stack and I call MP binaryop. So this is, we've done an if statement to figure out the current instruction and call binaryop. But this case is I've written machine code to explicitly call MP binaryop. So Mcauser asks, is native and Viper just a method level or can you do it on the class level too? I have no idea. That's a really good question. To some degree classes do function like functions. Yeah. In the same way that like a top level file works like a function as well. Find out, but at the point where you're, the point where you're, the point where you're using native on a class, you're generally better perhaps to be. That's not per instance. That's just like when you create it the first time, right? Like the body of a class is only called once. Yeah. So whether or not that would apply to the methods inside, I don't know. What you're probably better off doing in that case is putting it in a file on its own and then using the cross compiler to generate an MP. And one of the arguments to the cross compiler is native everything, regardless of what decorators they have on them. And so that's what you use if you want to really optimize like an entire Python file or whatever. And Dr. was commenting on the laughing. That's just because my windows are open and there's people outside. Bruce says this is almost as long as low as you can go almost. And I totally agree. And I also want to point out that like, We're not even halfway yet. We're gonna start looking at disassembly soon. Yeah. Just hold your horses. I want to point out that for folks like, you know me, I've been working on Circuit Python for like five years and I've never gotten this deep in my work. So it's really important to say that like, as Circuit Python, this is the huge, huge, huge value that we've gotten from building on MicroPython. Like a lot of this comes from Damian and Paul and other MicroPython contributors. Like somebody asked me one time, like, how would you do Circuit Python differently if MicroPython didn't exist? And I just said like, I wouldn't have ever done Circuit Python without MicroPython. Like this stuff is the stuff that like is way over my head and kind of not my forte. So like this is why we should support the MicroPython folks like Jim and Damian and why Circuit Python is what it is and feels like Python is because of all of this really detailed work that like, I am certainly not suitable for doing but obviously Jim and Damian are. So this is why we should support them. Thank you for saying that. And like, let's be really clear, this is all Damian. I jump into this about one way here and when there's an interesting bug that I just, because my curiosity gets better at me, but if you look at the file history, for example, in Native, it's popping up on my other desktop. But anyway, it's just, it's only Damian that works on this. And it is really cool. Like, and so he often says like, it's not clever. Like it's not, it's just, it just is a compiler. Like it just is a compiler. It's just the thing that someone just would write. But it is, I actually really enjoy coming here because it's not that easy. I have the benefit of having prepared for this and reminded myself of how the details work. But there's a lot of things in here that they work the way they have to work. Like you've got to implement binary op, you've got to figure out what the types are and stuff like that. It just, it takes like so much detailed work and that's like definitely not my forte. That's definitely Damian's forte is like they really detailed, like the fact that there's a whole test suite on top of this that can make sure that it works as is or as intended is just really like incredibly valuable. Like the- Like any software project, it's the combinations. Like, I understand that the detail stuff is not, I enjoy the debugging and fixing, like figuring out how it works and making small fixes and yeah, but sitting there and writing some scratches people are really good at that. Yeah, the deepest I've ever gotten is really just into the like allocation stuff, but we don't need to go there. Another time. So for the most part, Emitnative doesn't know about the target architecture. Now you'll see there's a, I'm making a liar of myself here because directly above this is a bunch of special cases for extensor. And the reason is that extensor is really weird, but did you know that? So one of the weird things about extensor is the register windows? Yes, definitely. And did you know Dan was one of the people that came up with that? Oh really, awesome. Yeah, like when I was complaining about it when we first added S2 support, he was like, sorry, that's kind of my fault and like linked to me to a paper where he's like one of three authors that like came up with the register window stuff back with like risk one. It's just like incredible the things that Dan's been involved with. It's like so many things that we see, right? That it's these really, really clever solutions to problems are really awesome. If that's the only problem you have to solve, but when you also have to support x86 and ARM and whatever it's on, yeah, since the extensor win is one of our, yeah, it's basically 80, 8066 needs to be 32. Oh, interesting. Okay, so for the most part, the emit native only has a high level API to the individual emitters. And so this is saying it called immediate argument, for example. So let's try and trim this down a bit. So it doesn't happen if I remove the other clauses. So I'll run it again, that's in text. You've got the rogue plus on eight. Oh, yeah, cool. Oh, that's the parser that does that. And okay, so without native it's as expected and with native it's still wrong, which is good. Okay. Let's just do a quick sanity check and make sure like the world is not completely broken. Yeah, like, do we have to call bytes? Do we have to do anything, right? Like, well, that's okay, we need to return. And that returns zero. Which is correct. Which is correct. So that's good. So yeah, exactly as you said, the next thing is do we need the bytes? And the answer is no. That first one. And does this have, does it matter what value this is if I change this to one or zero? That's not important. But what if I get rid of that? I mean, as expected, now we should be back. That's correct. Yeah. Okay, so what's this line doing? What if you do it as the first argument, first item in the table, not the second? Good question. I'll make that zero again. Good. So it's correct. Very good, very good question. Yeah. So, what if we go back to what doesn't work and we make a local variable and then do it? A little bit. So you think like, if you looked at an optimizing compiler, you'd say that those two are the same thing, but this is not an optimizing compiler. This is going to be emitting. And we can look at that if I run VDD again. This function is sorry, it's native. So if I turn off the name of it up, is storing it into local variable one. So zero is packed and one is A. And this is not. It stores it and then loads it again. Yeah. Stores it on 13 and loads it on 15. Yeah. Okay, so that tells us something. So we've got sort of two candidates here. One is something is going wrong when we make it a couple. So maybe we'll make it a list instead. And yes, something is something about it being the second argument and going into another structure, but only when it's in line in that definition, not in that invocation, not in directly through a variable. So at this point, we've got a couple of options. I'd kind of like to rule out the fact that it's like the tuple of the list constructor going wrong, or at least let's confirm that by the time it gets to list, that it's the right thing to do. We've got a couple of options, but for me, like always just reach for GDP. And I could look this up. I could just step to do this, but. So I'm going to break on MP tuple, can be, oh, sorry, got to build with the symbols. And on MP tuple, can you open your tuple? And you're going to see why it's given. Exactly. I might just make that a bit bigger while we do this. So I've broken on an option you tuple, which I guess I should, on the screen. So MP tuple takes the number of elements and the items. And so I received two items as expected. The items are a list, so let's print items, point it to where that list is. First item is a number, and the second item is a number. Now this is a good time to cover a pretty important concept, which is these might be surprising that I'm calling new tuple. Remember the output that I expected was one, zero. But I'm seeing three and, well, both not zero and not, not zero XFF, I've got this other thing. So what's going on here is, these arguments are objects, MP obj t. And probably, if you walk away from one thing from this top, an understanding of what an MP obj t is probably the most useful thing. So it kind of breaks my brain initially when I first started learning about micropython, that this is true, but it is actually true. Everything in Python can be represented in a variable. Anything I could stash into a local variable or into a dictionary or whatever is an MP obj t. And the crazy detail is the MP obj t is just an integer. It's whatever, it's a little bit complicated because there's different representations, but all the different representations, they are just numbers. And it's depending on the architecture, it's either a 32-bit number or a 64-bit number. And the full explanation of how this works is in the couple of MP concrete.h, and the reason for that is that there are four different choices for how MP obj t's work, and they did make sense for different trade-offs and different architectures. And the default is obj a. I think on stm v2, MP concrete, or obj pro, I think it uses a default, which is a. I think in circuit Python, we're exclusively c, but it could be wrong. Okay, yep. And the reason for c is you get floats. It's a huge performance. It's a huge optimization for floats, but let's start with a simple one. So anything that can be stored in a variable internally in MicroPython is represented as a 32-bit unsigned, well, and depending on the pattern of bits in that number, we figure out what the types are. So it's basically type tagging. So if the last bit is a one, then this is a small integer. A small integer is a number that's small enough that we don't have to allocate it separately on the heap. And this is really great because it means you don't have to access the heap to do calculations on integers that are small. And so a small int is everything that you can fit when you take away one bit. So 32 bits left over. This is why our... So 31 bits. This is why precision is 31 bits, not. Exactly, yeah. This is why people were like, when we're talking floats or ints and Dan and I are like, it's not 32 bits, it's like 29 bits or 30 bits or whatever. That's because of this. So this just, the user doesn't care about this, right? So I can get two to 31 minus one. I can add 10 to that, Python doesn't care, but behind the scenes what happened is that that object is no longer stored in the MPOBJT. It's now an object that's on the heap and we'll come to that. And it takes care of this automatically. You can build MicroPython without support for this in which case you are limited to just sign integers. Right, we do have some builds in CircuitPython that do not have long in support, for example. So like Trinket M0 for folks who have run into that, like this is why. The next representation is if the last bit is a zero and then the next two bits are zero, one, this number represents a queue string. And so this is the intern strings that we talked about before. And so the value of this thing is appointed to the string table. This is a huge efficiency for optimization of strings, which is that for, because remember strings are two things, it's both literal strings, but also that your code is using, but it's also the name of like every global variable, every function, every identifier that's not a local variable name. So global variable names and function names and stuff. And so whenever I call a function unpack, I have to load unpack from the global stick. And the way I do that is, as we saw before, is I put that queue string into the opcode. And then the next is an object in which case the actual other than those last three bits, the value is the pointer in memory. So you can think of an object pointer as being left-shifted three bits and then bored with that sequence. And this is a thing for inheritance. That's not good to that. I'm not sure I do that. So David had a question that says, is there anything else than MicroPython that emits MicroPython bytecode? Like at one point everyone was making JVM bytecode even not being Java. Yeah, oh, interesting. Okay, so the simple answer is the cross compiler, but that is technically just MicroPython. So you can run on your PC, the cross compiler takes the .py file, generates a .mpy file to run on MicroPython. The way that works is it is just the next port of MicroPython or the Windows port, the special option enabled that can write up the .mpy file from the in-memory representation. I've not seen anything else that does this, though. MicroPython's bytecode and MicroPython's bytecode is a bit different to JVM in that it is much more geared towards Python. I could imagine that would be possible. That would be quite interesting. And it's cool to think, what if you wanted MicroPHP, MicroPearl, MicroRuby, could you write a RubyPearl PHP to Python bytecode compiler? I don't know. Yeah, what I have seen is blockly bytecode, but it goes through the intermediate step of generating Python, so that doesn't really count, I guess. So the takeaway here is that everywhere in the runtime and the VM and the admittance, everything that you can represent in Python is a 32-bit integer. And depending on the bit pattern, the way you interpret that 32-bit integer changes. And when it's not a small in, not a huge draw, it's a pointer to an object on the heap, I should say. It's not even, it could be in flash. Oh, and wrong, yeah, good point, yes, yeah. It's just a pointer. And I never get it right, which way it's Harvard or Von Neumann, but because MicroPython is assuming the one where they're all the same address space, that works for Flash and ROM, yeah. And so when we're looking at the bugger here, the reason that our value of what we expect it to be one is actually a three is because it has, it's been shifted to the left one and a one added on the end. And so that's why we get three. And similarly, zero XFF shifted to the left one and one on the end is one FFF, because that's just nine ones. Exactly, yeah. And so that tells us that the problem is not in the generation of the tuple. The tuple is being generated with wrong information. Right. Because the number was already incorrect. Right, because that should be a zero. Or it should be a one, right? This should be zero. No, it should be one, because it needs the one for the small end. Yes, you're right. Yes, you're right. Sorry, yeah. And yeah, and of course, and I've changed this as well, obviously. Exactly. More work required. And obviously the problem here is the, actually, and what we're worth looking at here is if I do a back trace in GDP, it's a disaster because, despite the fact I have symbols, it's come from code that was done by the native emitter and that doesn't follow the stack frame layout that GDP expects. So it has no idea how to back trace. But I could go up one. And might work. No, it just doesn't know. GDP just doesn't know what to do. What I could do is I could actually just dump the memory at this location and force it into the thing, but it's a better way. Inmit glue. So has anybody doing handwritten byte code to do something smart or super optimized? I think generally the answer is no because if you wanted optimization, you would not do byte code level, right? Yeah. I think about that, it's a good question. The byte code quite closely matches the Python you write. So like if there were a more optimized byte code, you could probably just write much more ugly, but otherwise the same, otherwise Python. One long line where you have no local variables. Yeah, something like that. But the other hand though, for the native emitter, if you want to write more optimized native code, I don't know if Circlebyte uses this, but we have this dynamic native modules. So you can make MPI files from GCC output code. So you can write a function in C and have that as if the native emitter generated it and whack it into an MPI file. If you want really highly optimized native code running dynamically on the device, then that, yes. And yes, we use that in much places. And I think that is, I think we did get that with the mergers and I think that Jeff revived the tests after I deleted them. So I think it does work. I think it does work, although I don't use it at all. And I honestly don't recommend it. But if you want to go there, you can go there. It's just, you're not going to get help from me or anybody else. Yeah. And there's an experimental thing that we're doing as well that hasn't gone very far that we'd like to pursue is writing parts of the core in Python. So in the same way that Python call into native code, what we're looking at is having the actual native code of micro Python, like the VM and the runtime call into functions in that are written themselves in bytecode. This goes back to the thing I said earlier is that the bytecode is more efficient in terms of space. And a lot of functions that are written in the core just because they need to be accessed by the core don't really need to be written in C. And so, like, I think the thing that's full request that Damian made built in some, for example, so you know, some of one, two, three, is currently written in C in the firmware, but there's no reason that couldn't have been written in Python and been much more efficient. So. Memory efficient, not execution, it's efficient. Exactly, yeah, flash efficient. Yeah, flash efficient. And so what we're looking at is some way to like literally write inline Python code in the C codes and things. Be pretty cool. In the C code. Yeah, we'll see. You could do a similar trick to the way that we have Python stubs in our C code of just like make it a comment, but with some prefix that designates it as Python or something. Yeah, we'll see, but yeah, isn't there some pretty good efficiencies to be came there? So I didn't know this at the top of my head, but I, what I did was, I didn't know about my quite debugable voice. So that's a quite a useful flag if you're trying to figure out the internals of the VM and the runtime and stuff like that. So when you turn this on, it turns on a bunch of things, but the most useful thing is it turns on this write code. So this is imitglue.c, it's kind of in between, my son is awake and he knows where the turn off all the lights in the house button is. I was going to say, we should keep going here. I think your family is starting to wake up. So that wasn't like a passive aggressive. I might get BBC data at some point, but we'll see. I don't think we're that popular. Sorry. So yeah, you can turn on my probably debug voice, but the one I care about is write code. And hello, Taylor's come to do some thumb assembly with us. Awesome. You can tell that we've been here before trying to debug generated assembly because there's a little bit of extra code at the end of, and this is why being able to do this on the inexport is useful as well. Cause you know, I mean, it's literally using POSIX FOPEN, which would work on hardware. So I'm just going to do the lazy thing and I'm just going to force it to run. Right? And I do that sometimes too. It's like, why would I set the flag when I could just comment it out? I have to figure out that there was those if statements up there and it would be complicated. And yeah, I can run my code again and... You got an extra F. You just, the last three lines you want to uncomment. Last, those three. Yeah, uncomment those. Right, cause the right code one is just through 166. You're right. Yes, thank you very much. And so this I could turn on as well, but this is actually what, this is what we got when I ran earlier. I know, I know. Basically what this would print is print the machine code in bytes. I have no interest in looking at, whereas this will print the same bytes, but out to a file, which means I can use tools. Right. And now when I run my program with the failure, I will now have a d-tx1 outcode. E. What? The first key. The first... The E for Taylor. Mm-hmm. Did that actually run? Mid-glue, compiles? See. Ah, for Taylor. What am I seeing? Am I running it from the wrong directory? Ah, for Taylor. It looks right to me. I'm for Taylor. You could put a print in there to make sure that you're running it. I'm running the right microphone and I'm running the scripts. I like that. I love this. I'll do that. First key for Taylor. Did I turn on the name of it? Yes. So David asks, is there a byte code to Python de-compiler? I don't know. That's a really good question. Dive. I keep saying that, but they actually have really good questions. The... I have seen one. There's somebody named Kevin on your Discord as well who's been looking into this. Kevin Waters. I'm not sure I've heard of the name. It would be quite useful. I've not seen one that's fully featured. Generally, in a situation where you need to know what the disassembly would have been, so the people working on MicroPython, you're also generating the code so you can just turn on the verbose app. Right, you have the source. Yeah. Let's build that. I'm going to put this before Benny. Before Benny. Oh, sorry. I see what's going on. This whole... It's before Benny. It's my fault having lazy. This whole section is so... Oh, it's in the word. It's in the word. So I'm going to... This is a whole other level of programming, huh? Benny is pressing T and B for Taylor and Benny is his best friend. Awesome. Well, he's learned to type really, really young there. Oh. Me too. There we go. Oh, there you go. Do you want to get some breakfast? Yeah. Nice to see you. Can you say goodbye to everyone? Thank you, Taylor. Bye. I know he can't hear me. Okay, so now I have our code. Somewhere in my history, well, I'm not going to remember that off the top of my head. So now I can use object jump, which is great because I don't have to look at the... So it's printed out the whole thing, which has lots of zeros, but that was my native code that was generated. Okay. Okay. Now, someone more skilled than this than I am could probably look at this and figure out the bug. But I couldn't. But I can... And it's not me either. I have spent enough time looking at this that I can tell you roughly what the structure is. You squinted. And, because we know kind of what to expect, we know what this function does. Right. And there's a test instruction, right? So this has got to be, the branch, the if statement. So test followed by JEE. I don't see anything else that isn't moves and calls this function. So that's got to be the if statement. And we recognize a few things as well. So for example, the one XFF is exactly what we'd expect to see. And we get the 255 case. So this is the... Right. So if we sort of thought about what's going on, we're jumping to 70, 70 is down here. So we're jumping over the code that sets 255 in the object type representation. And otherwise, we do the... And we jump over that to 88, which is there. So 88, I should say. Otherwise, we do one, which is zero. So I wish I had done this when I first looked at this, but if I had spent the time to really think and understand what this code is doing, you'd see that this is the assembly code for generating this conditional statement. Right. And this is x8664, right? Exactly, yeah. And we could generate the ARM code for this as well. And I know x86 is an assembly better than ARM, so I'd much rather work in x86. That's the opposite for me. I've definitely been exposed to thumb more than ARM. Yeah, right. X86. I'd be a bit lost. Okay, and so these calls are the calls back into the runtime. So what's happening here is you're loading into the AX register, the 64-bit A register, an offset of 0x80 from RBP, and then calling whatever we get as a result of that. And so showing you the code, why this is the case is going to be tricky, what's happening here is RBP, when you're running a native code, points to the function table. So basically, the problem here is that there's no linker. We can't just say, like, call mpBinaryOp. We have to, like, do all that ourselves. So on entry to native code, we set RBP to be the function table, and then 0x80 is the offset of a function in the function table. And so I can actually look up in the function table the 0x90, 0x80, 0x168R. And I... in native... So one thing I thought would be interesting for folks, we were talking about code size, and if you look in this column, the left-hand side is the address, and then the next column that, like, varies in width is actually the bytes that are the instruction. So if you were looking at thumb, you would only basically have a single column there, because thumb is essentially... Or you would have two numbers, because each number is a byte, and all thumb instructions are two bytes. All thumb two instructions are two bytes. Whereas you... This is varying because x8664 is a variable length instruction encoding. And then the thing on... Then the next column is, like, the human-readable version of that. Yeah, exactly. Good point. So this is the offset, like you said, offset what the bytes would be, and then the disassembled version. So this is something I do a lot when I'm working on my Python, is I should rely better on my editor to do UI sublime, but it doesn't always figure out the cross-references and stuff because there's a lot of preprocessor stuff going on. I mean, you use it way more than I do. I'll tell you that much. So, for example, if I wanted to find this nbf-build-touple, it's not... Because it's a bit confused, but... This is the function table, so these are the offsets. And so what we'd expect to find is that 0x80, which would be... 0x80... This is where I just have a pipe on top of it. Yeah, yeah. I think there's an implicit offset as well. But what we'll find is that one of these, I'm pretty sure will be 0x90 is binary op, and then 0x10, so 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10, 0x10. So, 16 bytes earlier in that table, so 2, 2, 8 byte words is obviously true. Because I've been here before. But yeah, we expect our coach to have emitted a binary op, convert it to being truth or not, and then do this comparison. So what our emitter is doing is saying load a bunch of stuff, binary op, do a bit more setting up all np-objects true, turn that into a 0 or 1, because like this could, we didn't know that binary op returns a 0 or 1, or true or false. This could have just been an expression that returned an object. I mean now we need to know whether or not it's truthy. And so np-objects true, we'll turn that into a 0 or 1. And so then rax is both the argument, is the function we're calling. The convention is that we also return in rax. So, what happens here is that when we call np-binary op, we've loaded the arguments, we call it, and we stash the result of that function to rdi, which will be the, rdi will be what we use as the first argument for the next function. And we then load the next when we call np-objects true, call it, and the result of whether it's true or not will now be in eax. And then, what we're expecting to do is then see if eax is 0. So, this is a common convention x86 because it's the minimum possible representation, is that we just look at the low byte. Because we knew that all the other bytes were 0, look at the low byte only and see if it's 0 and jump it's 0. So, what's going on here? Randomly in the middle of this when we're calling objects true and checking the result, we write 1 to eax, and then do some adjustment and then write, we write 1 to eax and then we write the 64 bit eax, the extended eax into an offset on the stack. And I wish I could tell you this is how I figured it out the first time. I did a lot more printfs than a lot more gdb-ing. But, and that was because I wasn't comfortable enough looking at this at its assembly and figuring out exactly what was happening. But what I eventually found is I turned on print statements. So, for everything that the neighbor midi did, so that would be one thing that the neighbor midi did, that would be another, that would be a call to objects true, that would be a call to np-pinary op. And if we look at it in the case of init-native-binary-op, init-native-binary-op in that case that we looked at before, it actually says, init-pre-pot-regreg, it call, it was an invert, emit-another call, and emit-post-pushreg. So, when we go and look at init-binary-op, here, this is the setup of the arguments, the push-pot, the draw, this is the emit-call, and so this whole section here is, sorry, this whole section there is what you get as a result of the compiler telling the native emitter to emit-binary-op. I mentioned Vipo just for completeness, the other case here is when the types are all integers, and Vipo knows that they're types are integers, and so instead of emitting a call to binary op, it literally emits a left shift or emits another left shift, there'll be all the different ands and ors, and all that sort of stuff. And so that's why Vipo is so fast, because instead of emitting all of that to call a function, which is a whole lot more work, it literally just will emit. It just does that. Or whatever, and then knows that it's unsigned or signed. Okay, so, where does this come from? Folks are commenting in the chat about how deep of a deep dive this is, and let me reassure folks that this is probably the deepest that I've ever certainly the deepest I've ever gone on CircuitPython, but I've been in industry since I graduated in 2009, I've never gone this deep for even stuff I did at Google beforehand. It's, hang out, this is a deep dive, enjoy it, don't think that you need to follow it. Just like all the other stuff I do, you can figure it out, give it enough time. Like I said, I'm condensing a day of work. And I have done this before as well. But, and the cool thing is about micro, approximately zero people need to understand how this works. Approximately two people. Jim and Damian. If you are watching this live stream and you're like, this is exciting and interesting, come and talk to us about the RISC 5 emitter, or figuring out a way to optimize this, right? If you're familiar with the stuff, you're probably screaming at this disassembly here and being like, what the, this is terrible. And, yeah, so optimizing this would be really cool and have some huge performance benefits for the micropython. Right. I might skip through what's happening here, because getting to the bottom of this is a bit complicated, but I mean, that's the trick, right? Yeah, what we know, so we now know what's at fault. We've mistakenly, and so in assembly program, we'd call this clobbering. And so we're using the EAX register for a particular purpose. And between using it for something and using it for something else, we've used it for another thing. Right, you're overwriting it. And so what we know is that we're in the middle of a, the ternary operator. And which I'm going to cheat ever so slightly because I know that we're in an emigrated jump helper. So the way I would have figured this out is either with the bugger or with statements, to find out that immediately after the binary up and before the sorry, immediately after that is true and before the jump. Right, so what you've done is you have this something if something else value, right? Like value if condition else other value. And so what we've done is the first thing to do is figure out like what the condition is, right? So what you're saying is like okay, we've completed getting the thing inside the condition part of that thing and now we've kind of like gone one level higher and now we're going to figure out based on that how to how to produce a value. That's exactly right. And so with a bit of work and so debugging in printfs I will find that the thing that is emitting that is true happened and I mean it's okay, we know it's got to admit and it's true so what we could do is um we need to get up early this morning we're we're in the weeds so that's not that but what I would do is I put a debugger on the code that emits the is true and then I would see what the very next thing it does and the very next thing it did was call the native jump helper which is to be expected we're emitting a jumper because it's all if same as the jumps so once again we can ignore the cases where the types aren't pyogs because we're not using pipa so therefore all types have to be pyogs and so we emit pop of the register means that we need to take the current value off the stack this is the value that we're going to be making the decision on this is a fancy way to say whether or not I want to push it back on the stack so this moves the stack back again which is that we don't have a peak register it doesn't matter we get the top value off the stack and then we call you might adjust the stack back so it's again back on the stack exactly, yeah and then we call objects true which is what we expected right so this is taking whatever object was returned and deciding whether it's true or not so at this point we are here, we've called mp binary op now figuring out if it's true or not and so what is the next thing we do almost like this is kind of amazing when this happens but like the code actually tells us exactly what's going wrong here because it's the bit where we do the jump we do this thing and right in between it we do something else right so we've we've set up the register with the result of mp object is true we're going to make a decision on that we did other work in the meantime right so here's where I have to explain three different stacks so the processor has a stack sort of natively there's a bunch of instructions to work like push and pop and stuff like that and that's the same stack as the C stack so when you call a function in C that's the same stack that operates on in MicroPython the function itself has a stack which is not on the C stack although that is a compile option that you can enable and so what's what I have what I haven't gone into detail on but this so where is where does that live yeah the C stack but what happens is that the meter actually goes through multiple passes and on an earlier pass it's figured out how many local variables there are and the and that crazy detail I said earlier is that all objects are just these MPOBTSTs which are all just 32 bit numbers we just figure out how many local variables there could possibly be in every possible branch of the code that's how big the stack needs to be and we make room for that on the C stack but it is not managed by the C stack right it's not using the stack pointer register and stuff like that we have our own stack pointer and that is too big to put on the stack we put on the heap instead so there's a little code path that does that then the emitter itself has a stack that is managing its use of that that stack that's operating and so now this is where there are a few cool optimizations which is that if we can avoid emitting code for manipulation of variables we don't so a good example of this is like if I add an immediate to the stack I could just ignore putting that on the stack if I immediately then pop it off afterwards and stuff like that so the details here are really complicated and messy but the point of this function here is and this comment explains it pretty well is that we have these three different stacks and we're about to do a jump and we need to synchronize them to make sure that they're all consistent because if we jump things will be messy and so need stack settled basically says do I have stuff in the emitter stack that is not currently co-gened in the in the python stack and and it turns out we do and this is where this particular case is really interesting that what this expression does is it's making a tuple out of these two values so if we simplify it to return 0 which is two immediate under the stack and calls empty option you tuple the pointer to the stack but offset by two and so it pops two values off the stack and makes a tuple out of them right and so done here the third stack so it's the processor c stack it's the micro python stack and then it's the emitter stack yeah but the emitter stack only exists while we're doing the emitting it's not a runtime stack but it is also managed as a stack and it can be it can be modified by the emitter and the emitter can then realize that it didn't actually need to do operations to modify the the python stack, the runtime stack right and so it's a really neat optimization that saves a lot of generated code and so what's happened here is that when we do this we we'll get to the end of putting these things on the stack without actually generating any code and then we'll realize that we're going to emit a new a call to new tuple and we're like oh okay we've got to actually get them onto the stack so that emit new tuple can access them um and what's happening here is that we emit an immediate that temporarily doesn't do anything then do this if statement but now it's like oh we've got to settle the stack but this value really does get onto the stack because this is a really simple example but these two branches themselves have sub-branches and stuff like that and do all other things with the stack and so depending on which branch we take we need to do different things and so need stack settled actually just says go through the emitter stack if we have any registers and the other optimization that's really called is that we can avoid using the c stack so the python stack not entirely put things in registers instead so for some local variables that will in fact never touch the stack that will entirely be kept in registers right so the emitter can realize that this value never even needs to become on the stack at all and so that's the other after this and the benefit for like to tie it back to what we talked about earlier is that the closer you have your memory to the CPU the faster it can run like registers are the fastest memory you can use because those are like in the CPU themselves whereas the stack is going to live the stack is going to live in RAM but for complicated CPUs there may be like layers of caching between the actual RAM and the and the CPU as well and orders of magnitude difference there's a really awesome it's probably out of thumb the rules of thumb numbers program should know the thing and it gives you like the orders of magnitude of what these things are and not only is it faster but it's fewer instructions to access right because if I want to access the value in a register it's just a mob or an ad or whatever but accessing the stack involves moving things around and so it's worth zooming in on the sublime to look on what's happening here is that what we do is if we have stuff in registers we now need to put into the python stack we just emit so it goes from the register and both from the register onto the stack so that's not going to clobber anything because you can do that just register the stack if I have an immediate in the emitter stack so this is the number you were holding on to that you didn't know exactly when you needed it so we're going to wait to wait to do something with it until we know when we're going to need it exactly and what I'm going to do is load it into a register because at the moment it's just a femoral right it's in the emitter stack load it into a register on the CPU and then I'm going to copy that register into the python stack and I'll give you one guess what do you think reg-10-0 is? it's the return register C-A-X exactly so for the x64 parix and so as you can imagine keeping all this state in your head and figuring out all possible paths and how these functions are all interact is so hard right so they that's why this is a subtle bug that's probably existed for I mean many many years right and when you look into this in detail and I'll save the analysis because that's actually quite there's quite a lot of paths you have to follow this is the only path that calls needsat second with an outstanding state so fortunately what is the name for that like you're invariant like you should only call needsat settled if none of the registers are available exactly and so I had a chat today with me about this issue and one of the outcomes of the conclusion was note yeah and then when you were like a lot of like proper example there was lots of cool ways that you can you can have metadata to say this thing clobbers this thing and in general microphone this whole pre-pot you probably would have seen a bunch of things about emit pre and emit post is all about actually and in here call it save as a call you save it's about the guarantees that these are the variants that we provide like these variables are all you can't rely on them being saved at any point whereas these ones you should and we are breaking that we are breaking that invariant right and that's known as calling convention so right so yeah when you look at if you look at like risk five like they have an ISA but then on top of that they have like calling conventions or something on top of that that's like if you compiled C code here's like how we're going to like use these registers in a standard way for exactly this reason it's tied to the API for the architecture yeah that's the binary binary binary interface yeah exactly which is which is as opposed to calling convention and so there's a couple of options here um let's write a test but um so here's another case where this is used but in this case it doesn't matter because there's no no like temporary states that we need to worry about that then is used later there is temporary state but it's not used the result doesn't matter here right um but in this case we do so we've got a couple of options one is that we could settle the stack but yeah um we could spend a bunch of time analyzing all these cards and figure out like and I think I convinced myself that you could call a need stat settled here because we know that obj is true probably won't um mess things up but much simpler alternative is that you know we're going to call it by rax but we're not going to call it by rdi and rsi and rsi and they give us the same guarantees so um the solution is like all good bugs is a two-character fix you just change which register it goes into yeah yeah okay and so now if I say that okay so let's run it I get zero zero and if I look at the um after I run it if I look at the disassembled output those two instructions will still be there but they'll you'd be using a different register then exactly so it's those who are still there but now I'm moving into edi instead of um and then I guess I should check um it's probably your debug panel I always leave my debug for instance uh that way um so to tie things up obviously what I should now do is write a test um and the way this works if we haven't seen it is um so the way our test workers we have these python files and we expect them to run the same on python and on micro python and c python on different platforms and if we can't run the code on c python then we provide what the expected output is but I would write I would write a file here called like native um native that'll stack or something like that and come up with a few test cases and and add it with this pr cool so that's flashed to the pie board um and I should be able to run here in zero on my my pie board as well so the same the same fix fixed it for thumb right um right which is a relief um and if I check that red temp one on ass and thumb it's using it is it is a different register um all platforms provide three temp registers and it is a different register so to be expected there as well right different from the return register um confusing uh yeah yeah temp zero is often the same as a return register but temp one and temp two will be different so that's that's a good way to think of it awesome um so what should we do to wrap up first I think if folks want to follow along with this what is the issue number oh yeah um micropython 7523.py so um oh I think I even I think I even have it open 7523 I think so yeah cool so um twisted twisted roid ambassador we're very grateful too for um for raising this in the first place and um and a really really awesome um case like really really grateful that wasn't you know 100 lines of code that we had to 100% and then um and then I should done this in the live stream but but this is the show a bit more detail about what's going on with the with the stack I put a few more variables and then you can really see that they are um the small int values sorry this is probably hard to oh I see what you're saying there's more stuff that gets settled yeah yeah exactly and so you'd see you'd actually see a lot of settling happening quivering x as it goes um but it's otherwise the same and then at the point that I wrote this I didn't know what the um what the solution was right um I spoke to Damian for some um um to confirm that that was in fact the right thing to do and um yeah so after this I'll I just haven't got around to making a PR for it yet it is Saturday for you so you shouldn't work oh yeah I'm not doing it today but I will um I will Matt points out that this is an example of why it makes no sense to pay devs by the lines of code they write yeah the time is in figuring it out it's not the most exciting single well that's a two character change but um I do have a story from from a long time ago where not me but a co-worker added an ampersand um to a a file and so it was C++ so it changed something from being a pass by value to pass by reference and it was like it was a piece of very very hot code path and it was just a type that somebody had missed the pass by reference and it was you know hundreds and hundreds of thousands of dollars a year worth of time saved as a result so there's lots of these they've been pretty good yeah I remember there was a time when I was at Google and there was a a loss of Gmail accounts that they eventually restored but it was all caused by a missing asterisk I think I remember that one great well Jim thank you for taking us to the deep depths of MicroPython and as I've said earlier in the stream like really like this is why we build off MicroPython, why we try to support MicroPython as much as we can and why I think everybody watching should should be doing that as well um you know the go ahead so the I guess one way to do that is by sponsoring them on github is a good way to do it you can also buy hardware I think there's it's probably still in stock um this is also the type of work that I've always offered to pay Damian to do for CircuitPython I don't think Jim do you have a company I work for Damian right so Damian has George Robotics is Damian's company and that's who pays Jim and I assume that the MicroPython sponsorship money goes through there as well um but yeah this is the sort of benefits you get from paying Devs to work on stuff um anyway we are extraordinarily grateful for all of Adafrit and CircuitPython support for your github sponsorship as well from Adafrit and um yeah it's really cool and like I said before I'm really excited to be working more closely together and just this week we had a great example of that where feature was implemented in CircuitPython that we're now adding stream yeah the split types yeah and that's really cool right saves per kilobytes on our really constrained ports and um that's it's really good I think you know we had that meeting between the five of us like you Damian and Jeff and Dan and I it felt a lot like a software team like who never meets right because like we all work in the same place we work on the same code and we think about a lot of the same similar things um so it's cool to have those communication lines open even if we're not in the identical same code base all the time um but uh yeah so this has been great fun Jim um thank you for taking me into more of the emitter stuff I think um if people want to try this on CircuitPython the the code should be there it's just like not necessarily enabled or uh tested on our part but we don't actively like turn it off um I don't think so um so people do want to get into these weeds but but still be in CircuitPython you're able to do that um this is all within that core like Pi directory that we share with MicroPython and that we've talked about sharing kind of like more structurally in the future um but yeah so this is uh thank you very much for all the really good questions um yeah lots of really good details that we got out of those questions thank you very much 100% yeah so thank you to everybody who who hung in uh to the technical difficulties at the start including Jim especially because all of all of the Australian folks got up early uh it started at 7 a.m. in Australia so thank you to Jim for being the guest and thank you to all the folks in the chat who are in Australia or other time zones as well who carve it out of their day to come hang out with us uh this has been a deep dive with Scott uh on Adafruit here um happens normally uh 2 p.m. Pacific time uh on Fridays um if you want to support me and you can do that by going to Adafruit.com and buying some hardware there no hardware launches during the stream today uh but it wouldn't be out of the question that does happen from time to time uh you can also purchase uh pie boards from Adafruit if you want to if you're in the U.S. and like that's convenient um we do obviously that supports the micro python folks as well um if you want to hang out uh Jim's on the Adafruit discord as well uh you just have to ping him um although don't DM never DM people um but you can you can ping people in public places uh you can join the discord by going to the URL adafru.it slash discord um I'm on there as well you can feel free to ping me uh don't DM me but you can ping me if you have questions um the reason to do it in public places is that other people may be able to answer your questions as well um so with that I think that's it um Jim thanks again and you're always welcome on the deep dive that was really fun thank you Scott it's been a blast and I learned a lot myself um I haven't looked at x86 assembly before so that was new for me um and uh yeah when do we get circuit python on the pi board so you can load circuit python on the pi board yeah already um once we started supporting the STM stuff we added support for it um so yeah uh with that uh I don't know even know if cat cam's running so we'll just say uh thanks again to Jim and we'll uh see y'all next week um check out MicroPython if you haven't thanks Scott thanks Jim