 Hello everyone, I am Ankit and I have spent past five years in the industry across multiple roles with the sole core focus of fetching insights from the data faster. So I started off as a big data scientist where I deployed analytical models on some of the biggest databases that are available like Teradata and ATESA and right now I am working on deploying analytics at the edge on IoT devices and so today's talk is about real world number. So in this talk we will be discussing about some real world applications of number and I will help I will show you how it offers one of the path of least resistance. So you might be wondering what is the path of least resistance. So basically it is simply the path which is most traveled. So it requires the least effort especially when it comes to open source software we tend to follow a path which other people are using because it guarantees us more success. So I will be talking about this path and in respect to Python so when it comes to Python we all when we run a Python code deep down we all have this wish that I wish if my code was a bit faster. So each one of us we all have this wish and we and there is the chief question is is there a way to speed up my code without making a lot of effort. So the least amount of effort goes into making the changes to make your code faster. So is there a way and to show you this path I will take you on a journey. Basically it all started in 2010 when I was working in a project at my college at N. T. Kanpur and basically it all started there when I came across Python for the first time. So we all spent a lot of time in our in our schools learning Java C++ and it was in college when I first came across Python and had and that was the first instance when I came to know that okay you can build applications using Python full-fledged desktop applications and it was way back in 2010 I made this fraction project. You can see a lot of colorful buttons so don't judge judge me on that because there was no material design concepts back then and and basically it showed my enthusiasm. So so all I wanted to do was one of the subset was to generate a Mandelbrot set. So it is a 800 cross 600 Mandelbrot set and it took around 30 seconds to generate it. So basically there's a equation that you need to compute for each and every pixel and you can basically plot the color plots and it took around 30 seconds to do that. So I was happy with that because I had all the time on this world but my project supervisor told me to focus on speeding up this process and looking for alternatives to speed up this generation. So there were there were options like scython boost.python at that time but just like learning Python is quite easy but speeding up Python was not that easy at that time. So the only the path of least resistance I found that time was the project called psycho. It is dead now but all I need to do was write import psycho at the top of my project run psycho.full and that's it. So I brought down the speed to around 10 seconds so it was one third and I was happy with that. Then I further brought it down to one second by just reducing the size of the image. So I was fine with it because I had a lot of modules that I needed to cover like WX Python and V Python and a lot to explore. So I did not look further into this speeding up of Python. So present day. So we all are familiar with the Python 2 sunset right now and just when I heard about this Python 2 sunset I went on a frenzy of porting all my pass codes into Python 3 and I came across again I read as I discovered this old project which was built on Python 2.5. So came across this project and I thought that okay coming to this coming a decade later how can I speed it up? What are the methods available to me? So this map shows basically this Gantt chart shows the timeline I call it the two decades of timeline of basically need for speed. So this lists all the projects Python projects that have come in the past two decades the projects which still continue and the projects which gain traction gain a lot of popularity but now have disappeared and the first one was boost C++ libraries it was a pure C++ library. So then came Pyrex. So Pyrex died down but basically Scythand the Scythand that we know today was actually a fork of Pyrex so it still lives. Then Shedskin it was a famous Google project that they started but it also faded around 2013 Cyco as I told you was dead around 2012 but it was a spiritual successor of basically predecessor of Pypy and then we had some other projects like Unladen Swallow then Piston Pigeon these were also backed by some big companies which have also shifted their focus away from speeding up Python and around 2012 basically Nuitka, Namba and Python these are the three projects we started with Namba being started by the creator of Numpy so these still live on. So you can see this is an overall timeline and to tell you more about Namba I will tell you how it is different from the other projects which are still alive. So basically there are a lot of trade-offs involved in this whole process of speeding up the code. Replacement so Namba is not a replacement Python interpreter like Pypy. So Pypy has a whole different implementation it has separate modules to run Numpy and stuff so it is not a replacement interpreter to a Cpython. Also it does not translate Python code to C or C++ like Scython, Python and Nuitka does so it is also not a translator. So I am not saying that these approaches are bad it is just that it is not that easy to implement your code. So coming to Namba so Namba basically is derived from Numpy plus Mamba so Mamba is basically one of the fastest snakes on earth so they combined it and made Namba and it is an open source JIT compiler as you can see which translate a subset of Python and Numpy codes into fast machine code. So it takes the function it takes the argument the function arguments are basically it infers the type of the arguments and then it converts performs a bytecode analysis of the function then it creates the Namba intermediate representation and then it rewrites the intermediate representation and basically finally it converts into a machine code. So this is how the entire workflow looks like but if you go deeper you need to know this but right now what you can do is basically I will show you how simple it is to actually apply it in our code. So coming to the first case which is again close to my heart it is generating the Mandelbrot set so you can see this is a pure Python code so what it does is basically you have a color palette color palette that you can see right now since the plot is colorful and you have pixels basically for each and every length cross spread so you have pixels and then there are two for loops each basically the list it is a list of it is a list of list of tuples so each value pixel value is denoted by a RBG value and you assign a pixel initialize a pixel value for a particular point on the complex plane and you can see there is a complex data type that I have used to create a complex number representing a pixel and then you again perform an iteration to find out whether or not that basically complex number has jumped basically its absolute value greater than a threshold and then you plot it according to the color palette so at what iteration it jumped so this is a pure Python code using the basic data structures which are available in Python list tuple complex so how can you speed it up using number all you need to do is just do import from number import engine add a engine decorator and it will automatically convert at the runtime it will automatically compile your code into machine code and you will get a speed which in my case was around 32x so you can see you achieve this speed without applying a lot of effort and basically the engine decorator is nothing but jet decorator with with the attribute no Python is equal to true so according to this basically what happens is you come you do not want any involvement of the Python interpreter that's why you do no Python equal to two and you can only achieve perform good performance if you use this so that's why by default try to use at the rate engine decorator and you have other function arguments like parallel so it will automatically based on your system convert the code parallelize the code execution you can also include a fast math argument which will actually skip some checks for mathematical calculations and speed up your code and you can do cache equal to true so what it will do it will save your compile code on this so that you can use it later so you will save that compilation time which is not that much still you will save that time and you also have an alternative to compile ahead of time so there is an also there is also an alternative but it is a different process so coming to the second case basically which is again which is a scientific computing problem which is solving the diffusion equation so in this basically you solve the diffusion equation which is given by which is transient in time and it and you have to solve the temperature field in space in basically space and you apply for a finite difference method to compute it which is central difference method for space and forward difference for time so you have all you have a rectangular geometry you initialize some boundary conditions and then with time you try to solve it so the code so the pure python code which uses list it looks like this so you have two for loops inside that you initially you compute you basically temperature field for x across x and temperature field across y and then basically the gradients and then you basically compute the final temperature for that particular time so it is purely mathematical computation and suppose it takes speed x what you what basically I went to a conference and I met someone who said why don't you apply numpy it is fast so what I did was basically I used numpy what I did I kept the structure core structure same which was the two for loops and just replace the list of list as a numpy a numpy array and it deteriorated my speed now my speed was 0.32 times of x so it deteriorated my speed and I was like why am I using numpy so I reached out to the person again and said that okay please help me out it is the code performance has deteriorated he said you you have to use numpy in a particular way to get the optimal performance just use this vector implementation and basically you can speed up your process so he sent me the code I ran it my speed was boosted by 55 times I was quite happy that okay yeah it worked but you can see the whole structure of my code has now changed it is not that intuitive to understand what is happening all the mathematics it's not that intuitive anymore it's just a single line which computes the entire domain so I went to another conference I met someone who said okay why don't you do a thing you use numba on top of numpy it will speed up your code performance even faster I was quite happy so I did from numba import engine I added engine decorated now my speed deteriorated again so so again I was confused that okay what should I do now my speed has deteriorated what I am doing wrong right over here so I sent my code back to the guru he he did some modifications and sent me back the code looks familiar yeah so it was the path that I initially started started with and I am back to square one all I all I had to do was just do from number import engine apply the decorator and I could have maintained my code readability and have gained the performance boost of around 370 times so you can see how intuitive number is in terms of first scientific computing and data processing so even if you use numpy arrays you can basically incorporate number you can unroll unroll the follow-ups and you can gain that kind of performance so right now I will show you so I'll just go a bit deeper and show you what's the difference between the two codes and why my performance deteriorated for the numpy numpy vectorization on number so number also provides some command line utilities through which you can generate annotated stml files which will actually show you what is happening behind the scene so this was the code which had arrow operation and you can see it is calling some underlying implementations basically since it is an arrow operation now we have to call corresponding numpy array array operations in number and so it has a lot of overhead whereas if I unroll the follow-ups I'm having no overheads it analyzes the code as a whole and converts into a machine into the machine code which gives it which removes that overhead and gives you that performance so coming back yeah so coming back to the third case which is artificial neural network through this case I will show you actually since we've since we follow the class structure we we tend to create classes in python so how you can incorporate number inside your class so suppose there's a class for a neuron which has various attributes then you have some available methods like initializing the way it's setting the weights and all so one such method let's take one such method which is called update way update weights so it has a follow-up you do some certain computations now suppose I want to speed it up using number what will I do so if I just want to speed up a particular particular method I can create a static method and apply njit on top of it and basically it will compile only that method which can be used efficiently in your class but you also have an alternative which is using jit class so in jit class what you can do you have to specify the spec spec is nothing but the data types of the attributes that you initialize so the attributes that we saw class attributes so spec basically you have to tell it does not do a automatic type inferencing you have to tell him what are the attributes and what is the data type and you have to add at the rate jit class spec decorator to that class and it will compile the entire class as a number basically jitted code so so when is number a good idea so basically number is a good idea when you have numerical algorithms that is a lot of for loops and basically data is in form of numpy arrays it can be flat buffer data then you have small chunks of code it can be a lot of different codes codes but you have a handful chunks of well encapsulated codes that you can basically come compiled to machine code and speed up the process and it can also be used in your data science pipeline for example suppose you want to use k-means for scikit-lan k-means now you can do the entire k-means clustering process but what if you have compute come up with a new cluster index and you want to evaluate your clustering using that index so if scikit-lan does not have that kind of provision of automatic calculation of like that particular index you can implement your own index and the computation of your index part and you can just do a number jit and it will compile into a jitted code which will give you that kind of performance boost so it it fits well in your data science pipeline and this can be well established by the fact that Dask and Rapids so Dask is the one of the leading scalable platform for data analytics to Dask and Rapids Rapids is for CUDA so they both love Namba they have well partnership with Namba you can run your Namba codes on these platforms also if you are if you are running basically some form of computation on your IoT device you can use for example in Raspberry Pi your BerryKonda you can use you can install Namba on BerryKonda and you can use it the same Mandelbrot set that I generated earlier I computed it on basically Raspberry Pi and I got around 20x performance boost by using Namba on BerryKonda so it so you can definitely try that out if you are doing some sort of computation on IoT devices and yeah that's it any questions hi thanks for the talk and looks like a pretty powerful tool one doubt I have is about the trace back especially if you have a machine learning computations or anything like that going on so how does the Namba JIT affect the trace back so basically are you referring to basically debugging a Namba jitted code yes yeah so you can yeah definitely you can land on basically errors by using some complex data type which Namba does not understand so Namba has in right now they have an experimental GDB which you which can be called inside Python and you can also basically use external GDB toolkit which is available on the compiled Namba code so so right now the external support of GDB is there and also an inbuilt support is being experimented upon so they are working towards giving some useful trace backs yeah hello yeah hi my name is Rohit actually you mentioned about an example where you were using NumPy and then it got reduced your speed got reduced and then you use NumPy with normal for loops and it got increased to 320x so can we combine the power of say Numba with the NumPy arrays and then get more speed maybe more than 320x so the issue was so actually the issue was not about arrays so what happened was that there were operators which we were working on there is so we had basically plus minus operators which were overloaded by NumPy for to perform array operations now Namba so Namba has an basically implementation for those NumPy array operation and what it does is basically it pushes back everything to the NumPy compiled codes so that's why the the codes were isolated and you had that over it but if you use for loops unroll everything and use for loops the entire code will be analyzed by Numba and it will look it as a whole and try to basically parallelize certain processes and combine it and that's why it gives you that boost when you unroll the for loops and it's not that for example you can use NumPy sign or NumPy cost function it also has corresponding functions over here but just be careful that you compare the performance there are times when NumPy performs better there are times when NumPy jittered with Numba does not perform that better so you need to do some perform an analysis between those two and take a call. Hi so I saw in the examples you are only using pure Python functions does it also work with extended libraries or for example for example if I am using something which is backed by boost Python does it still work in that case? So I was using pure Python just to show that what data types it supports basically so right now as function arguments it supports the primitive data types that is there in Python and also NumPy data types now if I put in some complex data type in the as function argument it will not understand so you so basically it was basically a subset of Python and NumPy so it might not have support for some complex data type that you are using some other using some other basically library for example if you are trying to put in pandas data frame it will not work. Hello yeah the decorators that you used you just used a simple decorator on top of the function yes I am assuming that Numba will perform better or it needs the data types to perform well is it correct? The adderate NGIT decorator will automatically determine the types for JIT class you have to tell the types but if with NGIT you are using spec yeah you might get that for the first run that compilation you might get lesser overhead but after that it will it will infer the type and perform equally. I am thinking of a very trivial example okay let's say I have an adder function yeah it takes two parameters A and B I return A plus B yeah if I pass one and two it's going to return three in pure Python yes if I pass string A and B it will return AB yes so how does Numba help in? So basically when you will run it using Numba so for example if you are using arguments one and two for the first run it will it will determine that the arguments are intent and it will compute accordingly and return the result but if you are using string string again for the first run it will calculate the data type and give you the corresponding result now for successive run it will work for the same thing but for each different data types it will infer and give you a new result yeah. Hi yeah this side so hi I really like to talk okay then I was actually planning of porting my code to Sysin okay so I am really glad I came across this tool it's it seems to be really good too good to be true kind of thing but my question is like what what would be the other part of the other side of the story like what are the trade-offs related to it is there any pre-processing delays so so the more than pre-processing delays that will be not that much of a trade-off but since it supports a subset of Python and Numpy you might come across some operation basically for example let's take for calculating a Numpy are a mean okay it might not support an axis which is different so it might not support the full length of arguments which are there in actual Numpy documentation so but in number documentation it will clearly tell you that what all subset it supports so that you while using the code you will have to go through basically some of the tracebacks and it will tell you that okay in the error messages it clearly tells you what it supports and what it does not support so once you are running the code you will come across it but definitely if you have any issues you can ping me on JTAB or Twitter yeah hello yeah thanks that was a great talk so my doubt is so you are talking about numbers use case in Numpy and other like in case of certain for loop situations so how does it work well with pandas because in pandas certain times certain operations in columns might not be vectorized so you need certain operations for that how basically actually just I told you before so it will not work with pandas data types yeah it will not work with that you have to get the data frames values and that is an Numpy array on that you can do the computations