 Yes, hello Welcome to my talk. I'm here as a private entity and hobbyist. I'm not here for a company But at least now my project got a logo So which you can see on the right side so I Have a quick overview of the topics. I've did previous talks at Europe hyphens Two and three years ago and these were longer talks this time. I was going for a faster talk I have some actual good news and I have some problems to share with you So I'm going to talk about who I am My name is Kai Hain. I'm a professional from the ATC industry and I do this as my hobby. It's my spare time effort And my spare time effort the title of a talk kind of disclosed it already is The Python compiler I got a bit preposterous about this but basically if you see the golds That makes sense Show you what it takes takes basically nothing. We are going to compile a simple program and the four bone program material but There's not going to be a lot of time to look at this so That's going to be fast. I'm going to present you with my goals and The plan how we get there The last point on the slide join me is the most important because this project is really high potential And it's limited by the amount of contributors so far. I am mostly on my own I have a few people who helped me and sustained parts of a project But this is not enough. So it's going slower than it could be Although if you as you will see it's pretty It's progressing pretty well Then I will have a look at some details of Newtka Newtka as you know, Python is very dynamic and complex language and I Have taken steps to reduce our problem and there's some common complaints here Everybody knows that Python is highly dynamic. How could a compiler even work? Then we look at optimizations. It's what we have so far that list got longer recently and Actually, I'm demoing now here and I when I wrote the title of the talk This didn't work and it didn't work until maybe last week the inlining of function codes Which is a I see this as a breakthrough to the compiler technically practically, it's probably not but technically is very Good achievement and what else where it's going to come so Maybe let me start with a name. It's named after my wife Anna and like it was Chizested Seconds ago. I could have named it like this. I named it after her in Russia's Russian and in Russian she's called a newtka and short-form newtka Which is tricky because it's pronounced differently than it is written. So but that is your name and I started it with after mingling with other projects Piper and siden To be a fully compatible compiler that doesn't have to make any compromises and doesn't have to invent a new language and so on I Was thinking out of the box so Most people see Python as a very powerful tool for some part of the language landscape But not all and I wanted to take it to Also where performance critical stuff happens. So sorry I don't see any time pressure So I do this the right way and the right way means that I can do this so it carries all all the weight of Python all the time It's licensed very liberally and you can use it with everything. So it's free software of the most free kind Most major milestones are now achieved. It's basically working. If you what all you want is an accelerator It's going to work on all the operating systems Android and iOS need some work But in theory they should work and I know that some people have done some things but It's still future work. Obviously the mobile space and Python Could see some help and maybe Newtka can provide that So what it does it use it uses older Python versions and your ones alike even the latest 3.5 meter Anticipating a question from you. I added support for that. So it passes the C Python 3.4 test you'd Running a compiled code And it takes a C++ compiler I will cover that issue more on later slides and it takes your Python code and that is it So it's really just a C C++ compiler in Newtka and you can compile So having a new language that is separate from Python means I lose all the things I like I put them all on the slides here and I'm trying to be a bit fast about the presentation, but you know There's lots of things that you are used to and if it's not Python, but for example Something else Then we just lose it. So I put a kind of stop sign Below there. So very important to me is if we have a fast Python It should be a Python much like Piper tries to be one or Jyton tries to be a Java dialect I can switch back and forth. So The thing I'm trying to do is if you start using Newtka, you are not going to have a price attached It doesn't mean that your project is Bound to using it that means if you encounter a buck in Newtka and it stops working You can just use something else instead So My ideas here for performance and these are very old ideas I've not done anything actually in this direction yet and I know that Edo is running around and presenting ideas for type hints and Everybody asked me will I support them and the answer is yes Although technically I would like something That also works during the runtime of Python. So in his proposal, it's just something that Python very exquisitely ignores and doesn't use and I don't like that at all I want it to be code that actually improves the quality and makes these actual checks And then the compiler just gets to benefit from the knowledge extracted from such checks So the first goal and one which I met a Couple of years ago was feature parity with you Python It's compatible with all the language constructs and it's also compatible with runtime So Qt lxml Whatever extension object there are You can use them The compatibility that I have achieved and that I have increased since it's amazingly high Basically, my first attempt at Newtka was to make a demonstration that something like a Python compiler actually can fit into things without having a price and This is now what I consider a true statement so from there on Well On to the next thing Some of these projects I mentioned need patches so pi qt Pi side and so on sometimes they Make too tight check on what is a function So I have a compiled function type and they were not tolerant about this without patches So the next thing is to generate efficient code from that as You will see of a pi stone benchmark. I achieved a number Two and a half fold speedup. So this is something I looked at but it was only a concept It was only to show If we don't have bytecode, but have compilation, what can we gain? It's not really worth it So I think this sort of speedup is unimportant so What we got new is this code generation is now starting to remove code that is not used and it's using traces to determine if objects need releases and as we will see later exceptions are now fast, I have a slide about this So constant propagation, which is basically just peephole optimization so Identify as many values and push it forward So if you assign a constant into a variable and use that later on You generate a efficient code. I have just recently achieved that What I haven't got yet and which will be an important part to getting any actual Improvement that is worthwhile for anybody He's to make type inference and treat strings integers lists and so on differently that's only starting to exist Then interfacing with C code the so-called bindings I had a discussion with With a cyber guy this morning Newtka will and should be able to understand C types and CSC if I and make direct calls I have a slide about that too and Hins type hints doesn't exist. So not this year type means and So on so I have here a outside view of Newtka Where you can see that on on the top left you have your code when you put that through a newtka Can be multiple files on newtka recurses according to your python path and just finds your code and produces from that a bunch of C++ files and Put it in a directory and then it runs cons And what typically happens is that people tell me for some reasons that I do not really understand that sconce is somehow bad I Don't think it is it does the job and I have a sconce file in newtka Which then can be used to produce a module So if you were to deliver extension module from your python code, that's feasible Even whole packages or you can produce an executable. So from a user standpoint Newtka and your code that is basically it sconce does handle the C++ details and I get very Nice emails from people who said it even found my my Microsoft compiler and just works It's very easy to use. So I have a very low barrier of entry When we look inside You will find that I have a couple of faces. So based on the Abstract syntax tree the same one that python uses So in a sense, I'm reusing the python parser, which is one of the benefits of not having a separate language. I Enter a step called reformulations So for example in python 2.6 with statement got added and it's Well, I could have a with note and generate code from that and actually First versions of newtka did that So I had a C++ template and a generated code which just happened to do the proper thing the compatible thing But that's not how it's done anymore We now have reformulations and with these reformulations the with statement ends up a simpler python We are going to see a few examples of that So I'm speaking very fast and I try to be fast the idea is also that you can have your questions asked So if you have a question just raise your hand and ask questions whenever you think you have one, please do so Then we go into optimization, which is basically an endless loop because Optimizing a python program you cannot have a single or two-parse approach because after every optimization Any other optimization may become feasible again. So it's an endless loop But it finishes at some point and then finalization is entered which just annotates the code a bit more This then final tree receives a code generation and then the directory we were seeing So that's very typical. What's probably special is that there is this reformulation step Which tries to make a baby python out of things so Time for demo This is a Python function and It has a nested function and it does local variable assignments And then it makes this call which can be inlined and actually You and I as a human we can We can see what happens and the thing which I'm very proud of is that I now have variable tracing and SSH sufficiently strong To justify that on a global scale Newtka will be able to understand that sort of code and produce a Simpler result so It has a verbose mode and here we see a look into the inside What happens there? When it runs So there are tried blocks Which is sort of true because of a reformulation for example this statement here does an unpacking and Secretly that involves try finally semantics. So if you get interrupted while unpacking you get to release something So but the static analysis finds out that the tried blocks can be reduced It finds out That the assignment to G can be propagated Entirely and therefore be dropped the one to X one to Y the value is then actually propagated and then In line 9 here we have a constant tuple Constant result we can replace the call to G with a direct call and we can inline the function We can discover that previously there was a variable G But it now is no longer used and it's so it's not assigned and then it's not initialized anymore so this uninitialized variable G it can be Well the releasing of it Should there have been an exception in a Python function your locals get released. That's What's behind this and then we propagate the inline variables And a very little build a tuple and so on remove all the try handlers and Ultimately, we are done with that so for example This is a even simpler program, but it will help me to make a better demonstration Better in the sense that right now the unpacking I don't I started to have analysis years ago for that, but it's not yet sufficient, so I Cannot have tuple unpacking and show a full reduction So when we run this as you can see outputs and a lot of findings and now For easier debugging I Have invented a XML representation of the node tree and I use this to test that something is entirely optimized and As you can see here, we have a statement return. That is just a constant to two so This function f all it does is a lot of churn around the notion of producing one constant Obviously your code is not going to be like this but It could be if for example the X were an input value of some sort and then If this was already a partially optimized function for some reason then these things make sense so We got this any questions about this The question was is it storing the reduced Python code anywhere? Actually, that's a cool idea for a project that I have is to generate Python code from the reduced One right now. I only generate C. I would love for somebody to Take the internal representation of the optimization and generate Python code from that Python code That is just faster than the other Python code but since We are making a Python compiler for a reason. I'm going to see directly and in C I'm outputting this but basically Technically the internal final representation is not entirely Python anymore So as we will see in the reformulation part For example wire loops and for loops. They don't exist So it is a reduced set, but it would be for example feasible to create Python code out of that so Maybe quickly here No Don't do this because I made XML and because I'm easily confused I removed the code that is not used and I had opened it already, but here we go again, so This is what the Generated code for example looks like So we have a local Variable return value initialize it to nothing then we initialize it to the result Which is a pre-made constant and we go to the function return exit Which checks that it's actually a return value and then returns it. So this is how it's become in Python world the most efficient The most efficient Code that you can have Obviously There's more to it. So we can also do mercurial This is something that worked two years ago. It's passing the test suit with mercurial So we could compile this now and actually I was doing it It took 35 minutes to compile all of mercurial, which is a huge body of Of code Right now Newtka is not making enough optimizations and discovering enough dead code But half an hour is pretty okay on this laptop without power so Generated code works like this I will be quick. So now it's C code When I initially started out, I was aware that this is a very ambitious project So the only reason I did even started was because C++ 11 The new C++ language was having so much cool new features that convinced me that code generation would be relatively simple now So the gap between C++ 11 and Python was relatively small It turns out That for example C++ exceptions suck and in place operations of Python are Optimizable, but that doesn't fit into one object only one thing so I went to C++ 0 3 and then to See ish C++ which is basically just see with some C++ elements But no class no types and it's going to be C99 soon enough So I'm going to skip something. So as an evolution three years ago. I'm talking now about the C++ 11 one The blue part that was the joke code generation So I had achieved something Phenomenal and that was a compiler which Was capable of integrating with all that Python landscape and made things faster Which was tremendous, but the other part. They are so small you can barely see them Pazzer Optimization there was basically only loophole people optimization for years ago two years ago, I went to C++ 0 3 and the code generation got a lot more dumped and reformulation Started to appear and optimization become bigger and right now code generation has become really stupid and Optimization is carrying the day so now These reformulations. I'm making some overhead there using temporary variables and so on I can now optimize them away So availability I have a high focus on correctness So it's available in a stable and a develop form the develop form is also Better than other stable projects. I content and I have a factory Where I publish things that are not finished yet. For example, the inlining code is right now on a factory branch There's not just good. There's RPMs and so on lots of people are already using Utka This is my most important slide. So I want you to join the project help me I will guide you and One thing I have to cover for correctness. You see the Oracle of Delphi Delphi means I can use C Python and compare with Python So testing for correctness. It's it's a dream. It's very easy for performance It's much harder It's a race and I have ideas and what I would like you to do is to help me Come up and develop with a tool that will help us Give a user feedback for performance because if you now compile your code, it may not be faster at all it may even be slower and We wouldn't know why There's no feedback. There's no idea which function is slower or faster and how much there needs to be a tool I need somebody with an interest to help me out with this and rescue us So This is the most important things I meant to say I would leave the rest of the time. I hope it's still 10 minutes For questions if you have them Yeah, okay in my opinion which Python language constructs we're making code generation for hard is I Think technically Once we are able to inline meta classes and very effects I think they will not be an issue and they are very very easy technically classes and instances Need a lot of babysitting Especially under Python to to be correct. So that's that was an issue and I had huge amount of difficulties with in place operation and Exceptions and expectually exceptions exceptions are totally a nightmare. So I was and Yeah, reference counting is no fun Which is why I develop a compiler too. So you do not have to write C code ever again Yeah, so next question for what The default type to me right now is object. So the question was how to handle it if something doesn't have a type Right now Newt Kerr is basically using node type information at all yet What it will do and let me Show you this In the future it will be able to understand for example C types and they make direct quotes But right now everything is an object be it a list string integer. I'm belly only not I'm not using the knowledge yet I will start to make now that I have this Tracing capability. I can produce proper traces of Python I will be able to trace the list and make optimizations dedicated to types, but I don't have it yet I'm integrating with lip Python and it's pie object asterisks. It's it's a standard Python object It's it's like you wrote a C extension code. Another question Yes, I tried various compilers Obviously on the C level there's still always something to gain and I'm trying to be clever and smart about code I generate and I find the Microsoft compiler to be terrible and intro will probably be better But technically newt guy should be in a position to understand Python So if the example that I showed you it gains by an order of magnitude Performance by just inlining a function call I don't have a slight time to show the slide now But if I'm if I just avoid a function call of Python I I can have speedups in the domain of 20 fold and and and so on and maybe on top of that with an C compiler you can add a few percent again Yes Yeah, that's also there was a Python talk 2014 and the presenter always also said it just works. I throw things at it What you just tried is also something called it's a standalone mode I'm not mentioning that because I'm not interested in it, but it means It would also pack all the things together and allow your distribution to other machines Something people also expect from a compiler is to be able to take the code to another machine My interest is mostly in acceleration and I'm solving this as a sidekick. Yeah, so but the feedback is it just works and So when I'm it's at the point where I'm surprised if something doesn't work for standalone I'm not surprised if something doesn't work because it's very hairy with extension modules often. Yes Yes, the bytecode is incredibly smaller definitely It's No need to talk about it a binary which contains the Python implementation and uses it it's larger, but I don't think it's anywhere near important issue and Obviously, you will be much faster with a smaller you are but Yeah, it's it's it's larger, but it's still small binaries. So if we can have a look at HGX It's sort 31 megabytes. I think yes to confuse users That's a very important part of my protec So Suppose the program you compile is named Hg What is going to be the resulting name without overwriting it? I want to put it alongside and I've made good experience with dot exa. It's relatively rare But the Python program exists as dot exa So, but I'm getting questions and please take note this one will run on linux despite its name And you can rename it. It was the work. So if you despise the name I Think I'm using a stack memory allocation. Yes And I would definitely if what I will need is some sort of list implementation That is not malloc driven if I know the size and if I know enough Sufficient things I will use stack memory. Yes Try to be fast The question is if I take advantage of ref counting I Try to do this So I'm not always taking a ref count when part C Python does but it's a very marginal game The real direction must be to avoid Python objects wherever we can and We will see how far this gets but of course where I can I will have this analysis and know that I don't have to Take another ref count because I will be holding one already No, there is no he's asking about standalone and the standalone distributions if you want to copy it to another machine basically due to the incompleteness of code removal includes the standard library and all the libraries that standard library uses and so you end up with a Largest set so the distribution. I don't think it's huge, but it's it's like a Python installation I suppose yes, I don't have real-world benchmarks because that defies the purpose of benchmarks I know that Piper is really cool now with presenting Real-world programs and how Piper stacks up to this I I Have this idea about Valgrind all my benchmarking I'm doing with Valgrind and Valgrind gives me ticks so I don't have to run many times and I can make Analysis directly and I would want your help or I will do it myself But I want your help to create a tool which will run the program in Python and run the program in newtka and compare the tool and make a highlighting of what parts are fast and slow and so I can Get you as a user can get an image. How much speed up do I get in my program and It should be simple enough to just run your program under under another tool and get a report so It's all laudable to have these benchmarks with synthetic Code, but I would actually like to empower us Me and the developers of newtka and the users I don't know if we can make this tool and the sad truth is right now There's nothing basically. I just have random number numbers of something and I'm working very hard on getting to somewhere but like I said, I have a time and I don't have a panic to be fast and everything tomorrow and Right now I am only starting to wonder about actual performance. So this is now Where I want to know how good I am Okay, thank you so much. Thank you for being here and for the good questions