 usage. There the tools are slightly less efficient and slightly less useful I find in the end than what we have in terms of time but there are a few and so it can be useful to know of them. First there is line by line memory usage there and for this we can use the memory profiler module if you have not installed it you can install it with this little comment and let's see there is a little question. Okay so let's come back then to Ram. So memory profiler so what the way it works that you import it and then you import what we call a decorator. A decorator is a little like well it it decorates the function so you put that before a function with at and then the name of the decorator and what internally it does is that it wraps the function inside like another function to actually tune a little bit the way that it will work all right and that's what it does. So let's try that one and once you have defined it like this you may then outside in the comment line call it with python-memory profiler and you would call like this code there and that would do a memory profiling of this code line by line with Jupyter magic you load x so that loads the extension memory profiler and then you call percent-memit of my function. Sorry I forgot to run this one here and now that one and just like this this is what we see okay so you see here it first begins with a little error and then it gives us info so the first one is fairly specific to the fact that we are in a space in the weird context that is the that is the Jupyter sales context so what happens is that Jupyter here has this little cell and then it will push it to a temporary python file and then try and give that with to to memory profiler but internally something doesn't work there and so it's not really able to go through that what this means is that it's not able to do a line by line monitoring but it's still able to monitor that the peak memory usage there was 1.1 gigabyte okay and that the increment that existed during this like during this execution was 160 megabit you can here see for instance another result that I got earlier so this is basically the current usage that means that it takes into account all of the things that I already have in my RAM at this moment that's why for instance here I have more objects than what I had when I did this first test there that's why this is larger but then the increment is fairly similar although it's not exactly the same it should be said also that this RAM measurement is fairly sensitive to time that is operation that are too fast might be missed by the profiler all right only because it only checks it only checks the the RAM every fraction of a second or so so sometimes small small very fast increment might be missed all right now if we try and put our function into another script so for instance here I've taken the time before the course to write this here the same code but now in a file all right so now I can more easily try and call this and now when I call this because it comes from an external file it's able to do the line by line profiling and so we can kind of see line by line what happens there all right so here I profile when I enter in the function that is what RAM I have used then here you see that I create a which is one million elements so there is an increment of 7.5 megabyte and there now I create something which is much larger with 20 million elements so I have now this much larger increments there times 20 about and then I delete b so I freeze this RAM and then return a so that's just returning it okay so now you see a little bit how this works line by line okay so it's fairly nice oh yes of course so by bush rack question can you explain what ad profile does again so ad profile is what we call a decorator okay so what it basically what it does is that it will wrap this function there inside another another function which will be which is called profile so basically it tunes the way that this will be executed okay such that when you call my function it's not only this code there that you wrote but it's this code but with this profile modification does and profile will just set out the RAM monitoring either just complete RAM monitoring or line by line monitoring depending exactly on what is the execution be executed from where does that clarify or yes okay again do not hesitate if you have any question or if you would like me to like delve a bit deeper in one subject or another right so now we can try and do the same thing in our pairwise distance function so now what I'm going to do is I'm going to write memory profile a profile here I set up the profiler and I set the precision the default precision is fairly low so I increase the precision of the memory profiler so that its precision is higher there because we have small increments and then with this magic I will write it to a file so this will not be executed this will be written and when I executed that it says writing tmp.py okay and now I can time it to check like how long it takes when it's in an external file you can indeed check that now there is a tmp.py file there okay and now what you see is that's okay it took as you can see 13 seconds and you can sort of see what happens there now as you have this and what you can see that we start with fairly small usage okay there and then we increase there and we have there the creation of all our distance on so forth but you see that there is no specific increase and the increase actually happened inside this function there okay that's kind of related to the internal working of python there which tend to create stuff I think on a lazy basis in terms of assignment of memory and so we see here that per increment it will increase by 0.258 megabyte in total and so then we have this final usage so we come from here to that and then at the end we come we have there that left all right so what we can see also here and what I want to convey to you is that it's not super easy with memory profiler at least in all the tests that I've seen to check easily what and what is happening there because first off you need to export stuff to file and then you also sometimes it can be a bit counter intuitive exactly where the increment happen okay there you see it happens on the for loop there whereas it's actually the assignment inside these metrics that cause the increase okay it's just that it is reported on the line that we would not necessarily expect there's a question by Jörg yeah for me it looks different because the the increment is two rows below so it's in the for loop so it's there yeah okay interesting so as I said here I think the main reason is also as I said like the monitoring itself is somewhat time dependent and yeah am I my experience with this with this module is that it's not the most nice it's not the nicest to play with it it exists and I think that for fairly simple case like this it performs quite well but when we start to have several for loops and call to different functions and so on so forth it becomes a bit difficult to play with and I think that this illustrates that fairly well one other thing that I want to point out is if I do the same thing but now without the profile you see that what took 13 seconds with the profile takes only 86 milliseconds there without the profiler you see that also a little bit when you see that all these numbers they are fairly big but now the whole time there is super high so this memory profiler has an extremely high overhead that means that here it kind of worked because we played with super small data but if I were to use any sort of slightly larger data real data this would take hours and for a slightly subbar result as you have seen okay so that's then kind of too carried out and I would say it can work and sometimes it's quite useful sometimes I would I would think twice before before using it all right um okay so sometimes instead you can prefer another alternative which I like to use as well and this one I use it more often is time-based memory usage okay so basically you will just have externally something that monitors what happens not line by line but just based on on a time basis okay again with you always have this sort of limitation that uh the that that that if memories happen too fast if memory change too fast in big peaks and and dips and stuff like this you might have to adapt how often you monitor the memory in order to catch this small modification but most of the time this works fairly well and this is a parameter that is easy to change okay here it's the same code but it's slightly weird because you don't want to import memory profiler in the script okay because otherwise it will default back to the line by line profiling so there is this little trick there but that's how you do that otherwise it's still memory profiler it's just that you use their mprof executable so I just do this now I have tmp.py but now tmp.py doesn't have this import memory profiler there all right you see here I do that with now bigger data than before because I know that the overhead is smaller during my testing I saw that and then I run this script so whenever I start a line there with an exclamation mark this will be sent to the command line so time and prof and then run tmp.py maybe if you have a windows maybe this time there will not work in which case you can just remove it okay and of course you need to have the memory profiler installed so when I run that you see that it tells to me that it samples the memory every 0.1 second you have an argument of mprof to increase or remove that and what you can see also is that it took three seconds to run so here the overhead is actually super super small if I were to run this with 500 vectors the pairwise distance would actually increase by 25 so I would be fairly close to maybe 2.5 second and there now I am somewhere at three seconds so overhead also is not too much of a problem anymore and now it generates a file that has maybe this name something close okay this is a date and hour and time so of course this will change and you have here there are now what was just created for me it's not really made to be seen by yourself so for that we call now mprof again with subprogram plot and then we say okay take this profile by default it takes the last profile generated but you can also manually specify which profile should be run and put it in a file called tmp.png so you run that okay and now it has taken this profile and created this file there this image file and when I show it it kind of looks like this so what you have here is the execution time and the memory used and what is so you can have it like just like this and now the decorator of profile here has flagged this function to be bracketed so you have here the time so you have the pairwise distance profile there it took 1.7 seconds so it's here and you see how it works so you see that you have a slight increase at the beginning and there a small you know increase all throughout the execution time that seems to reflect that as we populate this matrix there it actually gives us and uses memory okay the here the reservation of the of the space is or the usage of the space is not in a big term but step by step okay and we see here all little cross represent one point where the memory was sampled all right so that's the thing there you see that you don't have this line by line thing but if you have a good little idea about what's in your code I find it much nicer to see exactly like how things happen at what speed and is it big peaks or smaller peaks and there are some options to improve to change the sampling if you want to monitor very finely or more coarsely your RAM usage also you have the added benefit of having also the little time that lets you monitor how long it took to go into each of these space and and also should be noted that this works with a Python script but it actually works with all sort of executables so improv can be run on every kind of executable it's just that it's only with Python that you get this nice little brackets there's a question by this asking if more than one function can be monitored and indeed more than one function can be monitored I just would need to have another function and a decorated with the ad profile there and I would then get another bracket so that you can precisely see what happens here okay so personally I would say that this is my tool of choice if I have to if I have to precisely monitor the RAM okay um there we go um tata tata and then as you can see as I said the the overhead there is much much much much much smaller all right it's actually well manageable because that way we see three seconds versus 2.1 seconds so you know it's actually doable to just have 30% more in the execution time where as the line by line is more like a multiplication by 10 or so or even more okay um so that's it one little thing there you can see this big jump at the beginning these are these correspond to the import of the of the libraries so this also tells you a little bit how long it costs and and how much it can be important not to import too many unused libraries sometimes all right um so there you go that gives us now two options for the monitoring light by line due to overhead and time based of course if you do very very fast operations you might have to reduce the interval of your monitoring to maybe go every 0.01 second or 0.001 seconds but that comes with of course then slightly increased overhead because you do more monitoring now what remains is what we do when we have sort of let's say okay now you have done a few execution of this and you have an idea about where which are the memory structure which takes a lot of space okay typically or so if we come back to what we had before it's say that when we try to create these matrix there which is lend dissect by lend dissect in size so one billion by one one million by one million it fails because it would require 7.28 terabyte so sometimes it's interesting when you know exactly which are the problematic object who just know what size you can expect one single object will take in memory so that you know then knowing about your RAM or the RAM of the computers that you are going to execute your code on how much how far you can go when you know it it breaks down so for that the first thing you can use is sys dot get size of okay very simple function from the sys module so for example you get this and then you have okay you have size as a single float 24 bytes and oh sorry yeah bytes and then size of a string compensating 1 2 3 4 5 6 letter it's 55 bytes all right to get kilobyte you divide by 1024 megabyte 1024 by 1024 so for example i create now a 1000 by 1000 matrix okay and i created it and now i see that it takes 7.63 megabyte all right now 10 000 by 10 000 takes 100 times that okay so 762.94 megabyte and so on so forth and that's how we can quickly quickly go towards something that is bigger than my RAM now that's simple enough but there is a limitation to get size of and it's that uh well i'm going to demonstrate that briefly let's consider this so size of a float and now i take a list okay of 10 float what do we see there one float is 24 whereas 10 float is certainly 184 all right which is not really what you would expect you would expect that it's at least 240 bytes plus whatever overhead comes for the list all right so something happens there get size of has a limit is that it struggles a lot with containers and in particular lists and dictionaries and stuff like this and even more when they are when they are nested because what it does is that it it only gives you the size of the list itself and not of its contents okay so for that there is if you google a little bit there is a let's say classical function that we always copy and paste that looks like this so total size and then you give it some object and then what it does is that it goes it has like a few a few templates there for different style and then it will go inside all of the level of this object maybe this object is nested with different subsets and so on so forth so it will go through all of this deeply and sum up all the size of all the content that is in there and return all of it to you at the end all right and so now if we execute this function and we just go and execute that now we do get a size that kind of makes sense so this is 10 times the 24 for each of the float in there plus the here 184 was just the size of the container and not of the contained elements and we do get 424 so now we have the actual size of the object in memory with all of its content right so that's a little caveat sometimes be careful be mindful about this because if you don't know that but it can be very counterintuitive not that because NumPy arrays do not access the data by reference but have automatically like their data let's say linearly in memory with them you don't see really this problem with NumPy arrays all right so it's not necessarily something that you need to pay too much attention to if you work only with NumPy arrays but as soon as you have more types different types of containers on so forth be mindful of it okay there you see they return to you the same thing okay now second micro exercise for you so find out which is the largest square matrix that your RAM could reasonably accommodate okay my RAM is 32 gigabytes but maybe yours is different okay so try to find out this number you can do a trial and error approach or you can be a bit more mathematical about it and not crash your computer whatever you know how you want to tackle that is up to you and then if you have a little bit of time with any extra time try to think on how we could modify the main script to make it less memory hungry all right so I will while you solve that I will paste the main script there so that then we can discuss together how we wouldn't do the modification uh like this okay so I'll leave you to it as usual please do ask any question in the chat or by voice or in the google doc and please put a little green tick yes when you are done with the first part there okay find out what is the largest square matrix your RAM can reasonably accommodate okay and also you can all write your the RAM of your computer in the chat and if you don't know the RAM of your computer then you can try and google how can I learn the RAM of my computer it's always useful to know okay all right so we have done the correction for this little part I'm going to just paste in the chat just a little code that we use to to do the monitoring of the thing and so that you can fairly easily replicate that on your own to adapt it to your own system so that's how we can know like how large a matrix will be without having to at least do it ourselves but just by sort of understanding what is the kind of rule that go underneath that and then doing the computation for us for ourselves now the next part was about taking our script and trying to modify it to think about its structure so that we can maybe make it less memory hungry such that even if the file is super large we can still run okay with here we have to remember that this is what causes problems this is what is too big okay so now my question to you if you have had time to think a bit on it and you think you have an idea please write in the chat or raise your hand to explain your idea and we'll see what we can do okay no particular ideas or are you just ah yes you are just typing so Jörg proposed that we could use a smaller data type with less precision okay and not store the duplicated values okay and then the second one by test could you use a generator for example so for the first one with Jörg I kind of see what you mean but for this you use you would use a generator yes but at some point you have to maybe store these results no or how would you how would you do that maybe with a bit more precision now Jörg I think that what you propose is already kind of a good idea in itself okay first you could realize that maybe this square matrix there stores each result twice but maybe we could have some sort of a triangular matrix format where we only store things once so okay so now we would go from uh remember there was like seven terabyte terabyte and then we go from 3.5 terabyte just with that okay so still not tractable on our little laptop but a bit better now then you say okay maybe we can go from a very precise format so double precision would be float represented in a 64 byte and maybe we can go to 32 byte and say okay that's precision enough this is something that you might want to check with exactly like what sort of precision you do need for this computation but that's something we can do indeed of course so then we would again divide it by two maybe so we would go to I don't know how to do division anymore right uh we will go to something like this okay so now okay now a lot of uh a lot of of big HPC computer of big uh a big super computer would be able to uh to service and to help us we would be able to do that but still it would be very like we would need to go there in order to do this computation I would not be able to do that on my laptop but that's two very good very good proposition there so well played now do anyone has another idea on how we could think slightly differently our problem such that we could have this run on a normal laptop right so from David there is a nice proposition we can divide our big big big matrix into smaller sub matrix and then we could store on a hard drive on on the disk basically this smaller sub matrix as we kind of as we move as we go that's a very good idea so that's kind of the idea to say okay when you don't have enough RAM maybe what you want to do is to store temporary result on the disk this will slow down your execution time tremendously because you store stuff on the disk and that takes time to read and write to the disk but at the very least now your problems become extractable as I said remember it's always possible to wait a long a bit longer it's not possible to increase your RAM so that's actually a good solution and in many cases that's maybe what we would do we would start maybe with what Jörg proposed and then from there we would divide it into smaller problems that we would store on the hard drive now for this specific problem my personal proposition would be to say that step four compute the pairwise distance matrix and then we directly write the matrix without doing further computation on it okay so then I would say why do we actually need to store the matrix okay why not write it as we compute it okay which just like we compute one little bit of the matrix and then we write and then we don't need to remember the rest of the matrix there so this is of course a little bit specific to what we do in this specific code but whenever you encounter this sort of problem you always kind of have to go let's say specific in your in your in your solution so the way I would sort of do that here would be to sort of change this to say that you have your main script always and there here you would not allocate that part but you would maybe just here say that you mix the writing with the rest so I would sort of mix it here make that climb back a level and of course then you want to mix both sets of instructions so we start we write a header and then here for each of these we will then print the different element as they come right so maybe I can yeah write that and then I will print yeah I just keep that as just a similarity I just need to keep a single one and then I just keep the similarity there you want to change that the end is nothing okay so that you don't come back to the line then of course the file that you write to is out and then what you want to count for is that if it's not the last element then you should put a little comma between these so how you do that might vary a little what I think I will do is that I will say if j is not zero so if it's zero then there is no comma before but if it's not zero there is a comma before so if j is not zero then I will print a comma okay all right and then that should do it and the last thing that we need is whenever we have finished with one sequence we need to go back to the line so we just print this new line character the end is nothing and then the file is always out and then we have that all right so there you have it I might have made a small mistake because I wrote that very fast but I think that the concept is sound here that I open the file and then as soon as I have computed the sequence similarity I push it I write it to the file there I never have to store it all right okay so now you see a little bit the process does it make sense to you what I just did and how I did it yes for a few maybe for the others you need a bit of time to digest this yes okay cool and so then with this new structure now I don't need to allocate anything anymore and it's just this should be super lean in memory