 Please give a warm welcome to Frankie. Hi, thank you. So I'm the lead developer, platform developer, for TheAtlantic.com, the eponymous website for a 160-year-old politics and culture magazine headquartered in Washington DC. Today I'm going to be talking about how to manipulate the internals of C-Python so that you can make the interpreter evaluate 2 plus 2 to equal 5. So a bit of background on my topic. Originally what I was trying to do was create a class decorator API that looks something like this. So here you have a class, and then you pass as its parent class a metaclass of sorts that tells that it should be patching, and then you define the methods, and you'd be able to use super to call the original method. It turned out it really wasn't possible, or rather it was, but it was very inadvisable, because it required you to mutate the MRO or method resolution order. The class is in a way that can only be done by accessing the underlying C-structs. But in this semi-failed attempt, I became curious about just how far you could push the limits of manipulating C-Python internals at runtime with C-types. Would it be possible, for instance, to make 2 plus 2 equal 5? And it also seemed maybe slightly topical with recent events. So it was one of those situations where I'd gone down a rabbit hole. Several points I came very close, and so the tantalizing nature of it kind of drove me forward to eventually achieve it. And also in the process, I learned a lot about how different parts of the Python interpreter work. And so I'll go over those. I'll sort of structure the talk so that it kind of recapitulates the order that I attempted to solve the problem, and then along the way explain the different techniques and underlying structures that inform it. So prior art and reference, there's a library forbidden fruit that does something sort of similar. It doesn't go to quite the same extent, but it does cover the first technique that I used to try to patch integer addition. And the full code for this presentation is in Python double scripts, which is at that URL there. And it has running unit tests and whatnot, so yeah. So let's say this test driven development, we're going to write a test case. Probably a context manager is a good idea because you don't want to set 2 plus 2 to equal 5 across the entire lifespan of the executable. That'll certainly crash something. I mean, I don't know how it wouldn't. So do it as a context manager, and then assert that 2 plus 2 equals 5. So naive approach, and it's only naive because it doesn't work, but I mean, in any other respect, it would work, is you take a reference to the old underscore underscore add method. You define a new method that if A and B are both 2, return 5, otherwise call the old add method. And then you try and maybe set in underscore underscore add equals in add. You get a type error. Can't set attributes of built-in extension type int. You try and set it on the dict. You get dick proxy object does not support item assignment. So quick kind of crash course in C types. C types allows you to load any shared library, but it comes with a convenience attribute for accessing the libpython C library by ways of ctypes.python API. Any C functions that are exposed in the library can be accessed as attributes of ctypes.python API. There are a few quirks with how it handles types. For instance, you set the attributes arg types and res type on the functions so that the ctypes module knows how to pass arguments. That is, how to convert, say, a number, an int object in Python into an actual C int, or whether it should actually be a Python object int. So in this case, we're using PyGet version. It returns a pointer to a null-terminated careray. So we use ctypes.ccarp. It also comes with a bunch of built-in types like ccarp, which is a pointer to a care, cvoidp, which is like a pointer to any address, et cetera. So arg types, when you specify it, it's a list. Each element corresponds to the type of that positional argument. It also has this very useful type PyObject, which lets you pass Python objects into functions and get them back as Python objects instead of as some sort of abstract type that you can't use. Alternatively, you could set the res type to cvoidp, for instance, as another type. That would just give you the address, and so you wouldn't be able to do much, but just as an example. So another really powerful feature of ctypes is the structure type. For this talk, the only thing will really be concerned with the structure class is the underscore fields underscore attribute. It's a list of two tuples. The first item is the name of the attribute, and the second item is the type. These provide a way to create Python objects that act like c-structs and can be passed as c-structs into c-functions and otherwise act more or less like normal Python objects. So here I've listed the struct definition for the base Python object PyObject, which is two fields. obref count, which is the reference count, and it's a PySizeT, which represents the size of a block in memory that can be read or written in a single operation. Though not entirely in Python 3, it defaults to the 64-bit version even on 32-bit machines, but that's kind of a minor detail. So the reference counts defined in the C code is PySizeT, so we followed that. And then the next variable in the struct is a pointer to the object's type. Like a quick recap of how structs are conventionally used in large programming language interpreters, because structs are basically contiguous blocks of memory and it's just a list of objects that say this first number of bits should be used as this type of object, the next set of bits should be used as that type of object. You can use the same, you can have a base type and pass that around and then as long as the memory is allocated for more things to be added to it, you can upcast it to something more specific. So for instance, you get a PyObject, look at the obi type, oh, it's an integer type, and then you call the method to sort of populate the integer and now you have a full hydrated integer field with a pointer to the number that it represents, for instance. And so in order to actually be able to use this, we need a way to be able to turn our Python objects into C type structures. In C Python, the built-in function ID returns the object's address in memory, so it's very convenient for this purpose because the from address class method on structures takes an address in memory and returns a struct for it. Side note, C types has a pointer function that transforms a type into a pointer of that type. We'll use that later and it'll make more sense in context. So here I'm getting the reference count of the PyObject. You can see in the first case, it's seven and in the second case, it's eight. The difference is that in sys getref count, there's an extra reference for the argument inside the getref count function. All right, so now let's try to override int add. We'll return to our original naive approach of patching integer addition in the dict. So if we could mutate the add item in the dict, we would be able to achieve our goal now. But since underscore underscore dict underscore underscore isn't a regular dict and is some sort of proxy for a dict, we need to do something with that. And it turns out that the underlying C struct has a pointer to an ordinary mutable dict. So another side note, a useful feature of C type structures is that like I mentioned about how PyInt object can extend PyObject, you can do that with the class inheritance syntax in Python. So here we have class dict proxy and because the struct starts with the PyObject attributes, those are the first fields and then the next field is the dict. When we use that, we can create a function that lets us mutate the class. It uses a little bit of trickery. This is the functionality that I sort of borrowed from the forbidden fruit library. You populate the dict proxy, then you create a temporary dictionary and you set an item with a key none and the value that dict and return it. And the thing you get back is a regular Python dictionary that you can change. So we make it mutable, we change the double underscore add method and it doesn't work. I mean, it does in a sense. You can call the underscore underscore add method and you get the desired result, but in every other case, it doesn't. Why doesn't overriding double underscore add suffice? In the C code, if you look at the function that adds numbers, it's doing something where it's looking at the slot function for the int type. So kind of a quick look here at what a py type object is. And so this again extends the py object and has extra attributes. One of them is a list of methods if the thing is a number. In the case that it is a number, it has a couple of functions. So we have number add, number subtract, number multiply, et cetera. For the first thing there, the syntax is a little bit odd for defining types in C that are functions or pointers to functions. The py object star is the return type. So it's returning a pointer to a py object. Star binary funk is the name of the thing. It's a binary funk is the name and it's a pointer. And then the next parentheses are the arguments. So we can set that with C types by using the C funk type function, which takes return type and then arg types as a list of arguments. So we duplicate that there. And we use a structure to represent the py number struct. So here we have the fields for number add and we specify as the type the binary function. All right, so we put it all together. We define the py type object copying over the attributes that we had over here. And we populate it with from address ID int. And then we try and call number add and we get what we would expect to plus two is four. So ignore the top function there. So again, we're going to try and do it as a context manager so we don't blow everything up in the process of testing. So we get the tricky thing with the original approach of grabbing the original add function and then using it again is that it wouldn't work if you were to just use nb add. Because when you change it, the thing you're then referencing is still a pointer to that change thing. So you need to get the address of the original function and then create a new binary function that points to that address. So that's what we do here in these first two lines of old nb add address and old nb add. We define our new add function. And then we replace the function and call the original the same way we did in that naive approach. We get pretty close. If you just do two plus two, that doesn't work. But eval two plus two does. So let's use the disk module to see what's going on disk. It lets you pass it either a code object or a string of code. And it'll give you back diagnostics and information about the tokens that the abstract syntax tree that Python uses to represent it. So here we have a global variable two equals the number two. And then a function add two plus two. And then inside return two plus two. And you can see how it interprets that is load the global two, load the global two again, add the last two things, and then return the value. So that binary add instruction opcode. In the C Python code, there's a little optimization where it checks if both of the things are ints. And if they are ints, then it does the addition in C as a sort of speed optimization. So that doesn't actually call on scrum score add. So how do you fix that? Well, you change the class of two to something other than int. Call it int two. So define int two extends int has its own add function. So it's not an int, not exactly anyway. And then we go to set it type error class assignment only for heap types. But of course, this is something we can get around by manipulating the structures. So here, we have a function that sets the type. There's a little bit of, I was unsure about whether I should include fully functional code that had things that increased references and decreased them for simplicity, remove them. I opted to keep them in. But the important thing here is the grabbing of the old type, populating the pi object from the new thing and then overwriting the OB type. And then having a sort of context manager to let us change two to be an int two and then back to an int afterwards. So here's where we are. With override type two int two, we evaluate two plus two. It's five. So same results we got before. We define a variable two. And then we override type two to int two and print it. We get a five. But if we do just two plus two, not the variable two plus the variable two, we get four. Why? The final obstacle is something called P poll optimization. It's called a P poll optimization because it sort of looks through the abstract syntax tree in little windows and tries to find bits of code that it can fold together or simplify or turn into something faster. So one of the things it does is it looks if there are two literal integers being added together and if they are, it combines them. So if we use that disk module and instead of using a variable two like we did the last time, we use a literal two, then we see that in the byte code, it just has a four. And if you were to look at the pi c file, if you were to analyze a pi c file, you would see a four there. There would be no two and two. So that's why that kind of prevents us from doing what we wanted to do. Um, in C Python, this is performed by the C function pi code optimize. It doesn't occur in an eval, which is why eval two plus two works, but not when it's defined in a Python function or an interpreter. So this is the craziest part of the way to get around this, which is to disable Python code optimization with what's called a trampoline function. You basically take the memory where the pi code optimize function is, you overwrite the first few assembly instructions to jump to a new address. That new address is a no op function that just returns the code unchanged and increments the reference and you go from here and success. You run tests and our test passes. So, yeah, that's sort of my talk. Quick kind of thing because we're allowed to do this. We're hiring at the Atlantic. We're looking for DevOps. We're looking for full stack. We're looking for front end developers. We just opened an office in London. James Fallows, if you follow the publication, James Fallows has moved to London and he's opening the London office and a few staff writers are moving with him. So I think there'll be room for developer in that office. And then also obviously plenty of room in Washington, DC or with our marketing team in New York. And contact information and again that URL for getting the code, which has a lot more detail. Like I sort of showed a simplified version of pi type object. It has like a fully fleshed out like every attribute matches whatever it is in Python two and Python three and has the differences between both figured into it. Thank you. We've got plenty of time for questions. So if you have any questions, please raise your hand. We have a microphone for recording them. Can you show us the code which you use to patch the assembly code to introduce a trample line function? Could you... Maybe I'm not sure I understand the question. Which code? In the last slide you showed us the one before, yes. Oh, this. Yes, how did you do it? Sure, I can pull that up. So in order to paper over the differences between Windows and Linux and Mac, there's like a lot of sort of boilerplate constant stuff here. But the main thing is there's a function called mProtect where it exists is different on Windows and in Linux and Unix and OS X. But you give it an address and memory and a length and you tell it how you wanna change what that memory does. So by default, the executable memory of Python that gets loaded into memory is not writable. But you can use this function to make it writable. So the first thing you do is you figure out where the function is and how long you need to set the jump instruction for. And then you make it writable there. So that's the mProtect stuff here. You define your no op which is to increment the reference because that's what it does in the original C Python code so it seemed, you know, maybe it's superstitious but it made sense to do it. And if you're overwriting a function in this way, you can't return pointers to Python objects in the normal way. So I just returned the address which it interprets correctly as a pointer. You have this quaternary function that takes four Python objects and doesn't return anything. That's what the PyCode optimizes. The first one is the code and then there's like the locals and the frame, et cetera. And then here's the override. So if it's not an x86 architecture, it throws an error because I didn't write it for ARM or any of the other machine languages. You get the pointer to the old function and to your new patched function. You change, the jump instruction will be five bytes. The first one is the jump byte code which is E9 and then the next four is a relative offset to the new address. So you set those five bytes as readable, executable, and writable. You find the offset between the two and then this was from me kind of just testing it. And then you combine them all together into a list of op codes. You can, the multiplication operator with C types lets you create arrays of things. So here we have like an unsigned byte and multiplied by five will give us an array of five of these things. And then we do from address, we get back an array that has five bytes each for the different instructions we're gonna overwrite. And then we just use the slice syntax to replace it. And then once you do that, that patches it to be a jump instruction of the new function. Hello and thank you very much for the talk. So it's basically, I have two questions. What kept you growing? Because there's always a certain point where you might say, I can't keep doing that. And how long did that take you? It actually didn't take all that long. I was pretty determined. I think one of the, this particular bit of code is interesting because this is how a lot of root kits and things work is that they'll sort of set memory writable and then like change actual assembly instructions in memory in order to, you know, exploit some sort of kernel vulnerability or something like that. And so like kind of delving into that was interesting in its own right. And so that kind of kept me going along that route. And then once I knew how to do it, then I sort of transferred it to this. So it wasn't so much that like I was determined to get two plus two to equal five so much as that like the things that I had to do in order to accomplish it and the things I had to learn were interesting enough on their own that kind of kept me going. Thanks for the talk. My question is besides having fun, have you ever had a need to do such stuff in production? Or no? No, although there is one case where I have considered using something like this. If you want to change, so going back to that original like attempt to do the patching patch class where you can just call super to call the original instead of having to do something weird like pass the original function as the first argument, which is what a lot of patching things do. That would have been useful for, I've been working on like this sort of combination of PIP tools and PIP does this thing where it does a reset hard for any editable repositories. And it'd be really great if I could like hook into that and instead of have it reset and potentially destroy your changes in an editable package that's in the SRC folder, it prompts you and says this branch is dirty, this there are uncommitted changes. Do you want to do something with that as a way to sort of make it easier for our developers to sync up their code with the latest changes on our GitHub. So that was like the one place where it's not exactly production, right? Because it would only be run within the span of like a developer updating some requirements on their own development machine, as opposed to say in production where you'd build from scratch or something like that. But you know, a developer wants to save time and just update the one package. So that's where I kind of looked into this. Never has actually happened, but it's a possibility. Thanks for the talk. You mentioned that the people optimizer runs quite early and that the twos do not even show up in the by-c file. Is it the people optimizer removing these twos or is it, do you need to disable the people optimizer right at the interpreter start or can you do this on the spot? I'm confused on when things actually run. When does this optimizer run and what's the reason this four shows up in the by-c and not the two? Is that the same or are those two different passes? Sure. So the by-c is sort of like a cache of the compiled byte code. So if the by-c was generated without having run this people disable optimization thing, you're gonna, it's not gonna work. Like the, the by-c is gonna trumpet. There's like an environment variable you can set to not use the by-c files. But, and so actually in the run tests, like the sort of test runner, I set sys.don'twritebycode equals true just so that, you know, there aren't any by-c files to sort of mess things up and cause that to occur. But yeah, so that's sort of how it connects with the by-c files. Otherwise, if there isn't a by-c file yet when it executes and it's set to write the by-c file, it'll use whatever is in memory. So either the people optimizer is enabled or not and then generate the byte code and save it to disk after that. So hypothetically, if you ran this once and it generated a by-c file that had the twos distinct, then you would, it would stay that way the next time you ran it with by-c files. Hi there, you had a really interesting hack-ish talk. In the previous file, you had some Microsoft, I know, HeyTalk, something like that. It's an interesting, how easy was it to find? A bit higher, narrow, higher up the file. You have some constants and then you shit-talk in Microsoft. Oh, these here? Okay, yeah, from the client, though. Yeah, so these? What was that, the debug? Where do these come from? Yeah. So for the constants for read, write, and execute privileges, those are in this sys-m-man.h include file that's on Linux Unix systems. And then Microsoft does this thing where they have bitmasks. So the benefit of bitmasks is you have one and you have two and then you can and them and then you have like the byte, the first byte is one and the second byte is one and so they're both on. Microsoft does this weird thing with this function where beside, like even though all of the values are offset by one in the list of bits, they also have extra properties that combine them instead of just like ending them, which would be the sensible thing to do. But anyway, these I found from like a Microsoft.com, you know, like explanation of how M-Protect works on Windows. Okay, because it seemed like horrible to debug in a test environment. It actually like, I followed the spec and it just worked. I mean, because the functions are pretty similar. They're like the differences are kind of minuscule. So once I had it working on Linux Unix, then I just had to like make a few tweaks to get it to run on Windows. Like the flags that you pass are slightly different. The order of arguments is different, but in every other respect, it's the same. Okay. If there are no more questions, just a quick note on the EuroPython app in your mobile, you can just rate the apps, you can add comments, you can thank the speaker, you can add any feedback you want, any constructive feedback for the speaker. So please thank again, Frankie.