 So, you know, these are the replacements we have been seeing on TV about how to profile test publications. Well, can we turn off the lights? Excellent. So, anyway, let's get started. Let me show you some bars to scale. I won't really tell you what these bars are about right now. We'll just go one by one to see, you know, the proportions of different things. So, that's it up here. Yeah, I'm going to measure your head. So, Doug is a smart man. He's writing a new debugger's content. He's written a bunch of Excel stuff which I can really not understand. So, Doug's brain is 20 centimeters long, right? I don't know how he manages to understand all these 20 centimeters. Say you're working in your computer. So, you have your beautiful wooden desk. Like here, as everyone else. You know, your computer. And you don't remember something. So, you reach for a book. Your radius of action in your desk is generally about 80 centimeters, right? Let me take someone else. What's your name? Okay, so can I measure your arm? Okay, 80 centimeters. Number eight. So, if you have to go out and reach something on the other side of your desk, it's 80 centimeters, you know? It hurts in your back as you try to stretch more than that. So, it's the ones that you have in your desk. If you have to go to the other end of the room, the cheap one, so it only goes to three meters around here. You have to stand up, sit down, read the book and say, oh, okay, now I have it in my brain, so it's something that's really far away. So, it's not inside your apartment. It's more than 10 meters away. So, let's get it back. It's not at the end of the hall. It's not even 25 meters either. Scaling down. This thing is far away. It's not the corner store. You can see the little, you know, the people who agree with eyesight, unlike mine, they can barely see the orange stripe here. So, what is this thing that is so far away? Don't these people looking at the speed of their piece of crap? So, this is getting... We're starting to see something here. This is the campus. Right? We are somewhere... I guess we're something else. So, it's much farther away than that. Let's go back to 3,200 kiloseconds. Right over there. So, what the hell is that thing that is so far away? Now, instead of making it this distance, let's make it nanosecond. Something that we programmers are very familiar with. So, one instruction, every half nanosecond. Sorry, it only has to be considered that it takes half a nanosecond to access the register and to run the instruction. And, you know, we have the first part there. This is exactly to the same scale as this brain, the desk, the room. You have to do something that is just in the L1 cache in the processor, which is, you know, not in the registers, but it's something that is already fetched from memory. It's really fast. It's only two nanoseconds. But it's still four times slower than a single. You really have to go to memory source cache. Memory accesses take ten nanoseconds these days. Right? So, if your program is accessing this huge array, and you end up jumping nanoseconds for each access instead of 0.5. So, that's a factor of 20. Scaling it down. That's one disk six. One disk six is eight milliseconds. Your program has to access a file on this, and it makes this disk change the position of the hard drive head. That's eight milliseconds. That's six orders of magnitude larger than accessing something in the processor register. That's five orders of magnitude larger than accessing something in memory. And there's nothing in between. You do things near here. It's going to run. There's nothing in between going to run and actually hitting the hard drive. Right? So, you are either really fast or really slow, and there's nothing else possible. There's this famous man, Colonel Packard, Mr. Packard, who once said this, you know, as processor speeds increase, processors are getting faster all the time. And hard drives have not improved their speed for years. You know, they barely improved their speed. So, we are making our machines wait all the time for accessing the hard drive. And later, this famous man, the beautiful, like we seem to know, Thomas Postman, and he wrote this on his blog a little while ago. He bought a new power book. ThinkPack, which is much, much faster than this power book. A lot more powerful processor, no limp operating system like MacOS. And he feels his new machine is just about, you know, he feels that it's about the same speed as his old machine. What's the problem there is the hard drive. In both machines are accessing the hard drive just as much, so both machines are equally slow to him. You know, this is a very informal chart of how things have been going in terms of speed. Discs have barely improved their speed. At some point, processors overtook memory, that was like in between the 386 and the 486. The 486 had a memory cache in the processor because the processor wasn't actually faster than memory access. And it's been going on since then. So, there's no way our programs can be fast if we do too much this access or too much memory access. Where are we going to bother optimizing things, you know? It certainly feels uncomfortable to use a slow program because, you know, you say, I wish my machine was faster and, you know, you sit there for several seconds watching programs load or watching something spin on your programs load. You end up sitting for minutes while your machine reboots. It's really uncomfortable to look. You know, quite bother. I mean, we can just wait it out. So, has everybody seen the new version of Microsoft Office, Office 12? They, okay, they took Office. They re-signed the complete user interface to make it more usable. And this is one of their usability experts. So, they actually found that when people found something that was really slow, they had a lot of trouble using it. Because they said, okay, what the hell is going on here? I moved my mouse and nothing happens. And a little time afterwards, oh, something moves. Actually, catching up with your mouse. So, they discovered, and they are slow, as long as the programmers start working to make them fast, they become instantly more usable without any changes to the user interface. So, the other reason we want to make things fast is to dominate the world. I mean, the whole world is going to be using all these machines pretty soon. And they really don't have it. You know, they have a slow processor, slow memory. They have no hard drive. They just use a flash drive. And we're certainly, I don't know, probably, I don't know, 100 times slower than RAM. I have no idea. It's still like one or two or five or six orders of magnitude. So, you know, there's nothing in between. Also, wouldn't you like to be able to buy a cheaper laptop, rather than a cheap one, rather than the latest model, which is always $3,000. The laptop you want is always $3,000. Yeah. So, how the hell are we going to make things faster? When people want to make their programs faster, they immediately say, okay, I'm going to make this faster. Boom. There's their big idea. But how much faster? This doesn't let you set a goal. This doesn't let you see when you're at one. This doesn't let you see regrettions. If you make your program really fast, over time it will get slower because you will keep adding code, modifying things, changing APIs, and eventually it will be just as slow as it was before. So how do we keep that from happening? How do we optimize a program and make sure that it ends up being fast after a number of years of changes? So the thing you have to do is to set a goal. You have to say, I want this pop-up menu to appear in under one tenth of a second. Once I reach that goal, I'm done. I don't need to optimize it anymore. The user's goal is not going to perceive any difference between the menu appearing in one tenth of a second and one hundred. So there's absolutely no point in going faster. I feel bad for working on a new mail program and his goal is something like, I want my mail program to use less than three megabytes to show a mail folder with fifty thousand messages. He has been working towards that goal. Or watch a program in under one second. Open Office these days takes like eight seconds to launch in my machine and it's like the fastest laptop I could get. That's just ridiculous. So once you set a goal, you can blame people when things go wrong again. You are not going to do what you go. You have to get your timings over and over again. By that time you will have created a little benchmark to test the thing you are doing. I'm not going to let you take the picture. How can you create a benchmark that's easy to reproduce and an easy way to reproduce your measurements? And then you can use it as a regression test in the future. That's what they've been doing with Tyro. The new performance test treat for Tyro, it's a big list of performances. You know how long it takes to draw a polygon, how long it takes to draw a gradient, how long it takes to draw a big complicated laser curve. And the way they do it is they record all of the sessions of their performance test treats and they keep a history of that. So they can compare the timings of their benchmark at any two-point in time and see, okay, something went wrong between this version and this version because now things are, like, five times slower or something. So we didn't know that a few years back. So when, you know, Stefan Kulov in Susse, he made a lot of work for Susse 9.3 to improve the boot time. So boot time in Susse 9.3 was actually pretty nice. And then he went to help for Susse 10.0. He went to help because we never had a way to keep taking those timings consistently. So in Susse 9.3 and Susse 10, people started adding more crap to the boot process, changing the demons, changing the start of things. And now it's just as well as before, you know. Right now Susse takes, I don't know, one minute to boot, two minutes to boot. So all the hard work that this person did was worthless because we didn't test it all over again. So there's this guy who is one of the pioneers of performance stuff in general. So let's see. Let's say you have, how do you decide, how do you decide what to start optimizing in a program? Say your program has two parts or a little part of your process has two parts. For example, to pop up a menu, you have to create the menu, populate the menu items, load the icons for the menu items, create the corresponding widgets, and finally show the menu. And you think, by intuition, okay, I think that creating all those labels takes a lot of time. So let's imagine that the menu only had two parts to it, like it had only the labels, only the icons and the labels. And you say, I'm going to optimize the label because I'm really sure they are really slow. So if you had taken your timings instead of just doing it by intuition, you would have seen that part A takes this much time, and part B takes this much time. It doesn't matter if the parts are not like big, nice, continuous chunks. This could be a little part here, a little part here, a little part here, but if you spread them out, they look like that. So you work for two weeks, you work really, really hard to make this part faster. You make it five times faster. And everyone gets really happy when they make something five times faster. I mean, that's a lot. But in the end, in the big picture, it's not really much faster. If this took, what, 15 seconds? Maybe it only takes 11. That's not a huge change. I can still go make it half that time. But instead, if you had worked on making the red part twice as fast, which is not such a big improvement, this wins. You know, to optimize anything, you really have to know what the slowest part is. You cannot do it by intuition. You really have to take your measurements in the machine, and then you'll see what to optimize. There's another consequence to this, for example. Let's say you, let's say that the absolute fastest you can make each of these parts is, like, the fastest you can make the red part is twice as fast as before. And the fastest you could make the blue part is this little square. So if you, if after doing this, the red part, you do the blue part, you will end up like this. All was to get here. There's no way you're going to be able to achieve it if this is the fastest you can go. So by setting your goals and later analyzing how much faster you can make each part, you'll actually know if you can, you know, if you can actually reach your goals. Well, we use a profiler. Profiling tools in free software, in general, they suck. G-Prof is completely useless. It doesn't work for shared libraries. It doesn't work for multi-threaded programs. It doesn't work for things which you use with DL Open. So it's very much for the examples in the G-Prof panel, because otherwise it's useless. I don't have SysProf installed on my laptop. Does anyone have SysProf? Okay. You run SysProf, you hit the start button, you run a program, and then it will show you a tree of all the functions that work that don't hold in your program with their respective timings. You can start seeing which functions took the most time. SysProf, unfortunately, only shows you CPU time. If you have an application waiting for this IO, it won't show you that. Or you can use S-Trace to monitor SysCalls. So what's the problem with S-Trace? Once you run S-Trace, and you give it the options to say, you know, give me the timings for each SysCall. This is the garbage that S-Trace brings. It says, you know, process ID, timestamp with microseconds, name of the SysCall, parameters. And that's the patient person you can go through that log. So this is the kind of log that I was getting a few months ago when I was profiling Nautilus. Not the start-up time of Nautilus. Trying to see how the Nautilus takes like four seconds between the time you run it and the time it actually paints this for the first time. So Nautilus start-up is 196,000 SysCalls. You know, and it's just impossible to see what the hell is going on there. So, back to DotG's brain. Can I? Can I? So 20 centimeters by 15, well, 15. Okay, thank you. Oh, I'm probably... 9! 20 centimeters of brain matter. 1,000 cubic centimeters of brain matter gives you a very sophisticated computer for processing visual information, right? Your brain understands this crap. Your brain wants to see porn or pretty pictures. The capacity to understand porn instantly. Pictures, and you say, I like... like that, it's automatic. It's really, really fast. So, two hours writing a little Python program that passes that S-Trace crap that's here, and I'm sure that people will even... And it says, you know, this is a timeline. It says one second, two seconds, three seconds, four seconds. And those little lines connect the, you know, the checkpoints in the program stack. So, immediately, you can see that here there's a big problem, because all this crap is happening in a really small space. So, it's not worth it to optimize this because it takes, you know, 6.32, 6.38. From here to here, it takes six hundredths of a second. While between here and here, you can see that it takes, you know, almost two-and-a-half seconds. So, there's something going on here which takes two-and-a-half seconds, and you say, what the fuck is that shit? And that's actually GNOME BFS Demon launching. The next thing I did is, why the hell does GNOME BFS Demon take some... take such a long time to load until I start going to GNOME BFS Demon? I fix the problem now. This has gone to, you know, being pretty much instantaneous. The same problem up there, you know, let me see if this works. So, we have the same problem. This is GNOME 1.4. Nautilus used to have this little function that initializes all the icons that Nautilus will ever use. So, it reads a ton of files. It reads a big ton of files. Later in GNOME 2.0, something very recent, we got GNOME to do all the icon registration and initialization and X-ray path. So, it turns out that Nautilus no longer needed that all, really old way to load icons. But nobody knew, because that code was really old. Nobody had ever measured the timings, and nobody really remembered that it was there. So, when we looked at this profile, we saw, oh, why the hell is it registering the icons? It doesn't mean to do that anymore. So, we removed that function and removed the one line called that function, and those what, 1, 2, like 2.2 seconds, they just went away. Right? So, if you spend a little time making a little automatic tool, you will have much easier time. I ended up making like 30 of these charts, and they were just, you could just look at them and see the problem instead of going through this graph. Yes? One question, how do you specify the login points where you're loading them? Good question. The thing they did, the only thing that stressed print is these codes. So, different points in the program, I wrote this little function that is called the access, access system code, and to access, you pass a string, just to see a file. But instead of passing a file name to access, I just pass this string, which is you know, you could do that today, you could go to the top of books to actually get this, but it's much easier to just put it in different points. So, I just call access a bunch of times, in every single one of those codes fails. But then my little Saxon program looks for those failed access codes so if you take the time to write those little tools to signature in a program you know, you'll have much easier time because I ended up getting like 30 of these charts, and they were really useful. So, what do you do when you're promoting things? You have to take the same measurements over and over and over again. Especially when you're doing a lot of I.O., things get very unpredictable. The Linux kernel is a piece of shit that comes to I.O., so it gives you really unpredictable results. So, you have to run things many, many times to get some apparatus, do some statistics to get, you know, to get meaningful results. Otherwise, one time you get a measurement of 4 seconds, the next time you get a measurement of 7 seconds and it doesn't make any sense. So, you have to take some statistics across a set of results. So, you actually have to take the time to make it automatic because if it takes two minutes of typing things by hand to get one single set of measurements you're going to do it twice then you're going to be bored and go, you know, look at the poor part. So, you have to make it automatic. If it takes you five hours to write a little program to get those benchmarks, it's going to be much better in the end. Things you measure. You want to measure tasks. You want to measure fine-grained, well-defined tasks. A lot of trouble goes and a lot of GUI programs have the same problem because they are very, very asynchronous. It's not like, you want to measure the time from here to here in the source code because that source code will set up idle handlers and those idle handlers will set up time-outs and those time-outs will launch threads and the threads will later notify other threads when stuff happens and, you know, you never have any idea of when that stuff finishes. So, naturally, stuff like load the file list in an idle handler then also asynchronous and it repaints the whole screen on its idle so you never know when that shape is finished. So, what do you do? Well, you have to somehow you have to somehow insert a checkpoint when you're done. For example, in your remake function there, you could check your pending thing of your pending list of things to do to say, do I have any other file spending loading their icons? Do I have any other icon spending scaling? Do I have any whatever? So, I'm not going to meet the checkpoint until all those conditions are met. If you have never profiled an application, if you have just been adding code to it all the time, it's very likely that the problem is that your application is doing too much work. For example, you know, G-Edit, the simple nice text editor program it had this problem where G-Edit had this problem where you would use the search and replace function and it was really, really slow. If you loaded a document with 20 pages and you hit search only, but it would take a few seconds to go through it and it's really slow. I mean, it's just scanning text. There's a problem with that. When we run the profiler, we saw the following thing. We saw that G-Edit lays it was doing this guy, search next was doing, okay, search for the text was changing the sensitivity of the menu item that you used to search for the next item so that G-Edit could show you whether there were any more items to search for by disabling the menu item, you know. And of course, change widget, flip the page in one of the widgets and then it will repaint. Every single occurrence of the text that you were searching for, it was causing a full repaint of all the menu items. To make the search function return whenever there are actually any other items you collect that information here and only re-variate when you update the widgets. Do you understand that? Yes, no, I said way too much work for a simple equation. If you have never profiled a program the next thing you'll notice in your application, it's probably that it's doing too much IO. Once you have removed all the excess work, once you have gotten rid of all the crap, then you'll actually start to see real problems, like too much IO then bad algorithms then smaller stuff like that have high sweet memory access only trashless cache. And until the end you get to see really tiny things like, oh, you know you should be using better instructions or other compilations, because that never matters in the beginning because it's way too much stuff anyway. Some examples of bugs we have found if anyone wants to fix those, feel free you have to characterize the problem you have to say, okay, my problem is that this pop-up menu takes half a second to appear. From time I click the mouse and it appears on the screen, it takes half a second and that's way too much. Then you make an hypothesis, you say okay, well you make your hypothesis by asking the provider not by intuition, you know, because we are not so you ask the provider and you say oh, okay, the problem is that to create this menu I have to load these 10 icons. Your hypothesis is if I didn't load these icons it would be you know, 10 times faster. For example, you say, okay, I'm gonna remove all that crap to load the icons and I'm gonna see how much time it takes to see if my theory was right. If it's really faster then you found the problem, you just have to fix it. You have to change one thing at a time. That's the control panel of the Mark 1 computer. The first computer is in University. So you have to to restrain yourself from changing many little things at a time. Oh, I'm gonna remove this code and move this one while I'm optimizing. Now we have to change one thing at a time. Then you confirm it, fix it, fix it. Did you reach your goal? Are you done optimizing it or not? So why do our disasters happen? Why is our performance so horrible back? So programming is like building a sand castle. You know, programming is very funny because you can take, like, experiment and start by building a file on top of it and you have, like, an inverted pyramid and computers let you, you know, hold it steady. That never happens in reality. In reality, people build bridges by putting two big pile of columns on the ground and then they build the bridge. So programming makes it very easy to fix things out of thin air. So we keep adding and adding and adding code to our programs. Then the sea comes along and what happens to the sand castle? So how do we fix that? Well, we keep adding sand. That's not gonna work. The next time a wave comes along, you know, the hope castle is gonna go to hell. So we have to do more engineering instead of finding the top of things. We actually have to go measure things, establish our performance goals, fix things to reach those goals, and then have an infrastructure to make sure that those goals are made for pieces. We have to have the benchmarks. We have to run them frequently. And whenever they start going more slowly than we intended we know that there's a problem. During the last wave, we discovered that someone changed the caching scheme in Fountainfield to cache the rendered cliffs. And before it used to be that most languages were, you know, about equally fast or equally slow. But after they changed the caching scheme everything is okay except Japanese, Chinese, and Korean. Those were 300 times slower. Why did nobody notice? Well, because nobody has run the profiles. I'm sure someone in China was really happy about that. So, for example, in 1997 our scheme to load icons was really simple. Our load icon function took an icon name. We built a file name just by preventing, you know, user-share icons to known whatever, name.png load.emash. Done. It was really, really simple. And afterwards, the artists started going crazy and they want to configure the icon paths and a bunch of crap. So now our icon loading thing looks like this. File names can have different extensions. And we can have many, many paths for the icons for each possible extension and each possible path we look for it. If we look, if we finally found it, then we say, oh, is it an SVG? Okay. Then load the SVG, scale it, and carry it just so fast for that. So, the SVG afterwards we load the big map. After loading the big map we scale it. Then we actually take the scale. We match and add some cute shading to it to match the theme. You know, that's just such a bunch of crap. This is my GNOME icon path on the machine. This comes from the distro. The distro wants to be able to load icons from user-share images. Blah, blah, blah KD Blah, blah, blah. KD is not even installed on my machine. I don't know where the hell those things came from. Finally, the GNOME one starts and every time my machine has to look for an icon, it has to go through all those paths. Why the hell did we make it so complicated? So, why do disasters happen? People just start writing a program and call out beautiful APIs. And to people, an API is like a flower. Every little flower in a pot. People see that flower and they see load icon. You know, you cannot see actually there because it's, you know, a low resolution. That flower pot actually says load icon. People don't know what's going on under that load icon function. If we look closer, if we look under the earth there, you know what the hell is going on inside our pretty functions that we give them to use. They don't know that the monster here, he's going to bite their ass. They don't know that the monster wants to scan a million paths for a million extensions. And it's going to load SVG files and render them and scan them and shape them. You know, people just don't know that. What if our documentation said things like this? This is the function we used to take the hash we used to compute the hash code for a string. It's really fast. It just goes through each character in the string once. What if we said, okay, this function is really fast. It's running time is proportional to the length of the string. And you know it will run really quickly. However, the performance of this icon is completely unpredictable because it could go to a thousand paths with daily images, get a helper program to get the images for you. You know, that would help because if people write the documentation oh, that's a problem. What if we made these annotations machine-readable? For example, what if we put some dirty little XML in front of each function declaration and have a program read those? So this is like the header files installed in your system. Then you're writing your own code that you're writing in your text editor. This is your paper icon which name, you compute the idea as a draw. Then you load the icon, then you paint the machine. If we had machine-readable annotations, then your friendly text editor could highlight that code and say, okay, you wanted to write a fast function here, but that's not going to work because we're calling a really slow function here. This is not my idea. This is an idea from one of the performance experts at Microsoft. They are actually adding these kind of annotations to older .NET libraries and they are going to make Visual Studio do that kind of stuff. So we have to catch up again. That would be a really cute hack. Well, Michael Migs did ask to Martin yesterday. That would be really lovely. We need a way to know, the question was what about tools to detect things like page faults? Which you cannot really control. I mean, you can control when you load a file but you cannot control when your program doesn't page fault and the kernel has to go to this tool to fetch your data. I'd love to have those kinds of tools. For example, we have the problem where you're using your machine for a while then Mozilla goes and eats all your memory and everything else gets paced out but what happened there? Why is my program slow again? Well, it's possible, but anyway. So, yeah, we need a way to know this is the code that got paced out. Why is the... Just look how I was running OpenOffice right now and I bet you that accessing my menu here is going to be really slow even though I just used it before I suspended my machine. I just clicked on it. It took, what, like a second? The next time, it's really fast. What happened? It got paced out. But there's no way to know that currently. So we need hooks in the kernel to be able to tell us these are the VMAs that got paced out those VMAs correspond to these libraries. Then once you know the address within those libraries you can see oh, okay, GTK paint menu that got paced out or it's all the icons you had in memory that got paced out. Yeah, we'd love to have those tools. Yes? How to tackle the problem of people not being interested in fixing the wrong stuff? No. There are things that you need to be aware of like you said, hundreds of private things that are a wrong problem I mean, Well, actually doing profiting is a lot of fun. So it's a lot of fun but it's a kind of fun that takes you know it takes some getting used to because the tools we have are really bad and when you don't have the right tools it's just too cumbersome to get started to have access system calls and then get your logs and do it by hand, that's a pain in the butt. So we just someone needs to write better tools and someone really needs to know what kinds of tools we need to know fixing them. Maybe we don't complain enough when speed disasters happen. Maybe we just assume something has happened but nobody does. However, I'm pretty sure people will fix things if they corrupted their data. Maybe they don't care as much as speed. Maybe they have to refocus themselves and say, okay, this problem is really not fucking usable in this law so we actually have to fix it. In the moment it took a while to get used to the idea that we have to make things usable that people at some micro-system find on that same idea for a couple of years for everyone else so maybe we just have to beat them in the head for a couple of years before they start going on to make some things. A reason. We really think such a meta-data principle is the way to go, the last thing we showed. Because always somebody can't completely access the speeds because you need to have a lot of correct information everywhere otherwise the system breaks down. If you have a large group and some of the people are unable I think it would be better spent, for instance in defining the thing with access automatically placing synchronization points logging points to be able to profile well I mean this is a preventive measure not a corrective measure so if this could keep up from doing some disasters that would be good yes oh with that you are totally screwed with that you are totally screwed I think one pattern we need to start using more is you know often you kill things in idle handler so put them in an amazing operator whatever but there is no easy way to know when everything is done for example in GTK a lot of the time we have GTK do something and it does it in an idle handler but there is no object signal limit on that thing that actually finishes so if we look for those places and start adding I don't know, geogic signals like operation finish in some parts of the file that would help because then you could hook your log into that to those this one would be better sure that would be good yes in general you have to find those places and add some notification back then yes in the end you would need a thread and the way the thread notifies you that it's done is by you know some synchronization which would happen in the idle all that actually brings up a good point I'm really sure that they were writing every single time they run into a performance problem they said, oh, let's create another thread and do the impression there that's just like showing the trash you know that work needs to happen regardless of when you do it but they just created another thread put everything there and it will the user will not notice they actually do, I mean the stupid window takes 5 seconds to open this is how long now it takes to open a window I'm going to open my folder here click, click may not come under it's that dangerous it only does a read there every file to print them to the screen it doesn't try to load icons it doesn't try to compute fancy fonts to the fonts in the screen I mean it's much simpler it takes place application but if you run Windows Explorer it's a lot faster than any of the crap we have right now they have actually done their homework performance problem that is putting the versions in a thread anytime it's just going to be synchronous and your GUI is not going to respond while you're doing this it's a lot easier to measure when it's not when it's not asynchronous but often it's hard to change because the rest of the program assumes that so you actually have to take your time yes did you try to oprofile oprofile to kernel compiler for the kernel base compiler did you try to use it? I tried using oprofile and I found it ridiculously hard to use one thing about profiling is that you want to get started really quickly so the thing about usability also applies here you know we actually need to do some usability work in our development tools we don't know when to use oprofile well but perhaps it's too hard to figure out and all the kernel people really have well I don't know where a bunch of kernel people really have to use syspro because you run it you can start, you run the program you can stop and boom you can't use all the results immediately and oprofiling is like well that's a complication of the file now that didn't work let me look for another option it's really powerful actually I've collected a lot of statistics from the system it's really powerful but it's really hard to keep going yes did you try cold heat? the software in my machine looking for performance problems just some of them