 Hello. Hello everyone, thank you for attending my talk. As you said, my name is Julia, you can contact me on those social accounts. And I work at this little startup called Big Code in Spain. So what we're talking about this morning is I'm telling you the story on why I decided to package my client application as a binary, how I did it, and the implications I found on the way. So let's start in the beginning. It's April 2013. We just have finished a prototype of our application internally written in Java. We have this new smart engineer on board, who's also very brave, because first thing he does, as he joined us, is try to convince the CTO to move into Python. And he provided very rock-solid arguments on why we should do that. But the CTO wasn't entirely convinced because Python, even if you only distribute device files, well, it's very easy to decompile and the investors wouldn't be happy with that. But I was hating maybe at the moment, slow as hell. And I also wanted to move into Python. So I said, well, just let me do some research on the thing. And I really thought it would only take me a couple of minutes at Google to find a solution for a problem. And I went and typed obfuscate Python. And you can laugh at me if you want to. Because the answers I found weren't at all answers. They were on this kind of, well, Python is not the tool you need. It wasn't designed that way. It's philosophy. Plus everything that's ever being written in Python ever, it's open source. And if you want to do it anyway, it's really hard. And even if you, even real compiles applications can be reverse engineered. And, well, they hack Windows all the time. So they will hack your application to, well, quit your job. If your company is trying to do such unethical stuff, you should quit your company right now. Code protection is overrated. And just writing a legal requirement should be enough. Well, for me, that's just a bunch of excuses and lies. I mean, Dropbox, originally, I don't know if you still do that, but it was written in Python and was obfuscated. And yeah, they hacked it. They hacked Windows, yes. But I wish, I wish our application has such many people trying to hack it as Windows that Dropbox have. So you are telling me I'm trying to do something that's not possible that I cannot do whatever I want with my own code? Because people don't do that. And because it's hard, really, in my previous company, they have compiled PHP to see. So that's not going to stop me. I now want to do it personally. I just want to do it as an intellectual exercise. I want to discover if I'm capable of doing this. So the statements I wrote before were everything I found. There was a guy suggesting that maybe you could try to use Python to compile your Python code into C code and then go on. So that's where I started. So this is the process I came out with. First step is to take your Python code and compile it and convert it to C code with Python. Then you compile it with setup. Then you need to package it and create an actual executable thing. And I use pinstallar for that. With pinstallar, you get a folder with everything you need, executable and all the external dependencies you may have. And you can take that folder and pass it to any auto-installer software for your system. Deviant packages set up for Windows, DMG packages for Mac. Well, this is how everything is done. Converting your Python code into C code is actually really easy. Well, I don't know if you can read the code, but what we're doing here is walking through our source directory and replicating it into a new folder because you probably don't want your C files to be placed just by your Python files. So in every Python file, you call Cytonize method, which is cool. You can tell Cyton not to force compilation. So if a file has not changed, it won't re-convert it again, which saves you a lot of time. Well, that's all. Now that you have your C files, it's worth things become a little nasty and hacky and obscure. I haven't found a way to actually tell Sysconfig, which compiling flags do I have? They seem to be stored in a static dictionary that's created the first time you call Sysconfig.getConfig bar. And what happens in there is you don't know which entries of the dictionary are being used, really. And some of the flags are duplicated along various entries. So this is trial and error, mostly. First thing, what we're doing here is walk our new source tree of C files and creating an extension for every C file. This thing involved in here, the Pyrex without assertions is to disable assertions because you probably don't want assertions in production. And then for different platforms, you have to override the flags you don't want. And what it happens is that for unique systems, extensions are compiled with debugging symbols in them that makes your compile application bigger and slower, so you probably want to disable them. And then three days ago, I discovered that in one of our Mac machines, traces were enabled by default, but in the other one, they weren't. I just discovered it, so I had to add this new override here. But once you're finished hacking your Sysconfig configuration, you just need to call setup with your array of extensions. And everything gets compiled. So now you have your Python application compiled as a native extension, but you still depend on some external libraries, probably. So you want to pack it all together. And as I said before, we are using Pinstaller for this. What we are doing as we have, we had some problems with external dependencies with Pinstaller. We created a fake main file which imports the real native extension main file and all the third party stuff. Because sometimes you need to explicitly import some modules. Well, this is also a bit of trial and error. So first thing Pinstaller does is create, you pass this file to it, and it creates a specification file which you can configure a bit so you can tell where your binary contents are. So we are telling here first line to include images. And then some external modules like this, it contains binary files in it. So I just telling Pinstaller to copy the whole directory. I have some problems with crypto and some machines. So yeah, I did the same. I told Pinstaller to copy the whole thing into my project. Well, that's all. You get a folder with an executable file. You can copy your client machine, preferably going through a standard way of doing that. But that's mostly all. Well, have I achieved my goal of security improvement? Well, with Pinstaller, you can package your application into different ways. You can package it as a single big file or as a folder which contains everything. The problem with the single big file is that it's compressed. And every time you execute it, it needs to be compressed itself into a temporary file which works great for graphical interface applications. But it's really slow for common line application as is our case. So it's really easy for hackers to discover it's Python because if you package your application as a folder, they are seeing all the files in there and they could recognize stuff. But even if you package it all together, you can execute your application within a program that will print you every assembly line, sometimes with an extra help like this thing in there so everyone would recognize that that's running Python on the inside. Well, can the reverse engineer you with that? Probably they can import your native extensions and invoke your methods to discover what they are doing, but they cannot actually see the code. They even have help because if you didn't tell Python not to, Python by default will include the dog strings of your methods. But, well, it's safer than not doing anything. Other implications you may ask, it has, well, I'm using C, so is this any more efficient than running just the Python code? So I did a little benchmark, but first I need to explain what a project does. What a VCOG project does is take a C++ project, three source and analyze it, discovering all the interconnections among the files in the project and the external projects you may be using. So it's a CBO bound processing. So this benchmark I did on the X axis, I have the number of files in your C++ project, while on the Y axis I have time to process them, just processing time, not feeding from this, obviously. So what happens is that for small projects or medium-sized projects like under 500 files, efficiency gain in time is around 7%, which itself is not bad. But for really, really big projects, and this one, this last one is SDL, SDL library, which has over 2,000 files, efficiency gain was three seconds, which from user experience perspective it's a lot, and it's a 32% time gain. So I think overall, the process wasn't hard, wasn't difficult, and we gained something in the way. So I think there was, the reaction on the internet wasn't good enough. So, well, time for. Did you change your Python code, or did you just use it as easy with the site? I have a series of blog posts. I have a series of blog posts written with wider snippets of code. I will put this on the internet later, so you can go and read them. So time for more questions. Hello, thank you for the talk. Do you think it's important to site-anise the entire application, or would it be sufficient to site-anise only the kernel, the stuff that you do differently from others, and leave everything else in Python, and on a related note, how do you debug this stuff? Could you raise your hand, because I'm hearing you on my back. Well, isn't it difficult at all to site-anise the application? I don't mind site-anising it all, or just the processing part. How do I debug it? It's in Python. I run my tests in Python. I can set the debugging level, even when I'm running this site-anise application, so I can see all the traces. So I've never found a problem I cannot solve running the application just with Python. If you've already done this, then I assume you're happy with it, especially with the extra performance, but when you were doing your research, did you consider writing a custom loader and maybe taking the Marshall lock module and hacking it up so that everything looks different and sort of obfuscating that way? I prefer to use something that was already there and working, because I knew that I would probably would write more bugs than useful stuff if I tried to write my own obfuscator. So as this worked, the first time I tried it, I stuck with it. So I saw that you called the site-anise function manually, but probably know that there is an extension for the extension object that automatically does the siphonising part for you, so you can actually pass the PYX file to the extension. Is there any reason you did it that way? No, this is the point I started and I built on, so probably the process can be improved a lot. I don't know if calling the extension directly would allow you this non-compiling again stuff, probably it does. Yeah, it does. So no, there isn't a reason for that. Because I had an extension written in siphon as well, and I think the way you can pass compiler options is nicer when you do it through the extension object. Yeah, but the compiler options I'm hacking here are to the extension library. I'm calling extension, then I'm setting up the options and finally I'm calling setup, so there is no siphon involved anymore at that stage. Yeah, so what you're saying is that extension chooses some compiler flags by default which you don't want and you have to remove them. Any more questions? Can I ask a question? What's the implication for testing if you have a binary rather than your... Oh, sorry. Sorry, what's the implication for testing with the binary and are you having to test it all twice? No, because as I said before, I've never met that there's a bug that happens only with the binary application and not with the Python. We ran some final tests with the binary, just to be sure, but the big suit of tests is run against the Python code. Any more? Well, thank you very much, Julia.