 So, for those of you who don't know me, my name is Ashok and I also go by Ash. I'm going to be talking a little bit about organization. I am with the paragraph of the 15th, and this talk is mostly driven by the needs of online, and the general overview of the talk is going to be themes, essentially certain things that we found were costing us time or memory bytes, and we wanted to improve that, so this is not going to be low level, you know, very technically involved, but there is going to be a bit of technical details. So, the problem from a high level is the user is interested in two things. They want things to happen as quickly as possible, and they want to be able to scale very well, meaning in the context of the online world, the more users you can support on a given hardware, it means you're going to actually save costs, and you're going to have a much better user experience. So, it is a real daughter group at the end of the day, global. So, it's very simple from the user's perspective to say this is not loading fast enough, and when you say how long is it taking, they typically count it something big, because they're not very technical, so spinning real cycles would be one measure, and from an engineer's perspective, it is open-ended. It doesn't load fast enough, so I can't have that many users as I need, it's very, very big. So, it is challenging, and we've got to start somewhere. So, what do you do? For the load time and the memory, we have two different problems. So, maybe we separate them, and that's what I'm going to do, I'm going to talk about separately. But generally speaking, we need to measure things, right? For both of these issues, we need to start somewhere, to have a general idea of our legacy. How does the load look when you break it down into stages, let's say, and which stage is really taking the most time? But even then you have an interesting problem. From now on, what document do you use to measure? Because that's going to decide the result, right? Suppose you pick a document that seems to be obvious to you, this document that you... it's your presentation you've been working on for some time, and you optimize for that. Well, the user has a set of documents that are completely different, then it's no use to them to optimize for your case. So, it's really important to focus the target, the target platform, the target documents, the applications that the users are interested in more. So, if it is writer more than a press, you have to be careful and pay that much more attention to writer than a press. Similarly, you have to be conscious that sometimes there are conflicting goals. You might optimize for one thing and find out that you're actually pessimized for another case and you have regression and performance somewhere else. So, it has to be very mindful that your changes will be not very easy to test in all possible scenarios. So, you have to think about what code might be using this particular function, let's say, or feature. In terms of tools, we rely heavily on value, and for that we have code grid for the CPU performance, timings, and we have a scene for memory footprint profiling. I won't go into details, I will just say that these tools are what we rely on most of the time, but no longer. And there are tools that you can attend that might give you more information on that. So, overall, the idea is that we want to go fast, we want to be very responsive, and we have to be mindful that the user is not only looking at the spot, the user just wants something responsive, right? So, if they get to see the document quickly, even if it is not fully loaded, if they can't just quite edit it yet, it would be much, much better than if we show them something completely ready and fully loaded, just too late, they would like that. So, perception is mad for users. Similarly, the memory is very important in the sense that if you are doing a lot of aggressive caching for performance, meaning you want to have the data that is quite expensive to compute, ready to use it over and over, you have to be very mindful of the fact that you are actually using more resources elsewhere, which is memory, to make your CPU work less hard. So, between these two, there is usually a lot to complement, and you need to find that because those things don't come free. So, there is a perfect balance somewhere that we need to find. So, I thought this is a talk that doesn't have natural screenshots like my previous talk, for example. So, I took this opportunity to share with you some of my natural shots, some of my new copyright issues, that this happens to be solved at the same time, if you are interested, at quite late late. So, on the loading front, one interesting thing we discovered is that users were complaining that the desktop was much faster loading the very same document than online. So, what is going on there? One clue is that we don't want to initialization for every document over and over again, and online gives us disability, because behind it seems what we do is we create one process instance in the memory, and then we form from that. So, what we should be doing is we should be collecting all the initialization bits and do them before the forms happen. So, that is easy, except every so often you find that you are actually against one corner case, and this happened to be with localization, spell checks, phone loading for, you know, the phones that are often used like Asian, strict phones and so on. And if the document happens to hit those corner cases, you end up paying the high price for loading all this information that we are going to see all the randomly code needs and so on. So, that was an easy fix, and it saved both performance in terms of load time and potentially even memory, because not every document had to have its own copy of the initialization. Remember, if you do this before 14, the pages are shared in memory. So, double it. This is a more interesting case that I'm going to spend a couple of minutes on, where we have between the desktop and online. In fact, it was almost twice as slower in online to load a very safe document. We had two documents, one was about six seconds versus three, that's online in the desktop, and the other was 17 and 35. So, what's going on there? This is bizarre, right? What are we doing? Twice as much in online, right? There is nothing, obviously, as the answer, except the symptom was reproducible, highly reproducible, so that it wasn't approved. It turned out that a solar mutex, which is the key, think that the whole threading world revolves around, forgive the pun, is very much a scarce resource. And whoever holds it is essentially controlling the whole delivery office timeline. And the interesting thing that we found out is that you can have very interesting information. So, greatly, what is happening here, these are the two main things that we need to worry about. This is online, this is the part that the user is interacting with. This is your core kit wrapper, and that's your ID thread store for doing the ID jobs in the background. So, the key thing here is that when you do the loading, the load function actually shed some background jobs to finish out the layout and pagination and all the fun things that happens in the background. If you have a 300-page document, you do actually want those things to happen while you're actually starting to make the first page, right? And so, that is happening right at the end of the load function. The load function returns. So, the whole online side of things says, all right, great, you know, no, good. Now, let me set up the user. So, it needs to figure out the user ID, it needs to figure out, initialize the rendering stuff. So, it needs to do a little bit of stuff, and immediately after that, it needs to start actually rendering the different tiles to show to the user in the browser. Extremely simple, right? You can't go over these things. Close, especially when you hit the rendering, which is a timed costume on, eventually depends on the solar blue text. So, you will fight for that. If there is somebody else who needs it as much as you do, that's exactly what happens. What happens is once the load returns, when you actually do the initialization and go for rendering, the idle jobs keep them. Naturally, because, you know, or it's not doing anything, it's fine, it's free. And so, it starts doing all the pagination and the layout and the fun things in the background that take almost as much as the load function took. Just roughly happens to be equivalent. So, somebody invited the loading into two equal parts to see. And so, what happens is the solar blue text is gone, and we log and we can't render anything. From the user's perspective, all of this is loading where in reality we're finished loading, right? There was nothing else to do from document loading perspective. It was just that we couldn't render the document, so there was nothing to show to the user. The user was getting a blank page for 35 seconds also. And so, obviously, they weren't happy. So, this is a classic form of the test, the text of our case of priority version. And so, fixing it is not only robot fixing something like this, but we have many solutions. One is to say, okay, let's actually have the concept of priorities. So, you know, I did jobs. You have those priorities and rendering far higher priority because, you know, that's more important. And whenever I need to render the item, I also can do it. Except you need to rewrite essentially the scheduler and all the font-ready stuff which is the non-starter, right? We don't want to create a major viewer. So, there are other solutions. But it turned out that an easy fix is just to tell the item jobs not to kick in after loading immediately. So, what we do is we schedule the item job with some delay just to give enough time to online initialize itself and finish up with whatever it needs to start rendering at least the first page. And then we can keep the item job now. There are two things to note. One, we do this only for online path, meaning on desktop, there's absolutely no changing behavior. But more importantly, even in online, there shouldn't be a change of behavior meaning if a user tries to, let's say, scroll faster than the item job can send, they will still trigger these functions to execute and they will get normal smooth, paginated layout results, meaning we're not cheating at all. What we're doing is we're saying normally try to delay yourself and if we really need you, we will force you to work for us. This is an unbelievable thing. I just happened to stumble on a couple of issues with GTK that were not beneficial to the online effort that I was doing but I fixed them anyway. The point is sometimes you accidentally improve something that your task doesn't necessarily require. Just quickly, one was, it turned out that in some cases GTK repeatedly gives you font, font update, event notifications and we happened to have a broken case of caching and so whenever we looked up the cache, what was the last style that I actually need to update things since the cache wasn't updated we always thought that we were out of data we always triggered font refresh that cost us all signature cycles and the only way I noticed this is because I was running from battery to laptop and every time I ran at the library office it would just drain my battery however, I didn't notice it if I had loaded a laptop besides the window or did anything essentially to disrupt the initial state of GTK. The other one was that we were using some excessive memory in the case and so that got fixed as well. So next I'm going to talk about organizations and but in turn out that in case there is one big beast that if you found and tackled, you're going to actually make a significant impact as opposed to trying to save a few bytes here and there and the effort overall was just too much for wins going for the big pump is usually a good idea and just for those who like this picture you can try and guess what it is. I've never told anyone yet and this has been a few years old now so it's fun. So the easy one is to say okay I have all these data structures that I create many many instances of in memory and I try to see how I can minimize here footprint because if I create an infinite instances of some object and even if I save one byte obviously one kind of community is still very large. So that's the concept here and it turned out that for certain types of files you get a very large number of certain objects that's just disproportionately more than others. So PDF has something called graphic context and it needs to map the graphic context to a unique ID and incidentally it needs to look at this map both ways. So it needs to go from the graphic context to the ID and in other cases back this is needed to decide what to render on the screen because the PDF actually has an object table where essentially it needs one copy, one stream of every element. So it really doesn't need to have that many images that happen to be your logo and it's a unique ID. It just has one copy of it with a unique ID and it just stamps essentially on every page at the right coordinates. So it has a lot of functionality however we store two copies of the map for each direction and that was consuming about 160 megabytes of the map and so combining that because you don't want to duplicate your data into a bar map which happens to have essentially saves us almost 31% of just the map in this specific case but if your users have a large volume of PDFs they do make a difference. I had a talk yesterday with those who missed it who were probably able to see the recording of all but very briefly what the work was about is to make sure that we can render the PDFs very quickly and very accurately as the user can edit. And so in that context we have worked hard to get the PDF functional and we used the PDF to render the PDF pages into big maps in order however the tricky bit there is that we had to not just render the and be done but to do that and work well with the graphics layer which is higher and so the overall reductions in memory wasn't necessarily significant immediately for all the documents against those who have seen the results from my last talk or remember there was a major win there but we were trying to not add any overheads to the view so related to that is that we had to deal with and Cairo and Hitman which were very close to get to render all the graphic elements on the street at least for the Hedda's which is used in the online work and so what we had found out was that there were multiple cases where we could improve the cache so what we did is we went there and we reworked the logic two things that were effective one is that we were more aggressive in evicting unnecessary old or large cache objects and it turned out that in many cases a few hundred bytes objects in the cache in some cases we have a few hundred kilobyte objects in the cache so it's not really well balanced so what happens if your cache is one megabyte or four megabytes however it is and you have only five large objects in there it's not a very effective cache in the case because you've just evicted several thousand smaller fontblicks and smaller graphical objects and re-creating those is going to be very costly and then you would push them in the cache and eventually you would end up either over shooting your cache size or you would end up essentially not benefiting much from caching at all so what we did is we changed the logic a little bit is that now we do a more aggressive eviction and we try to get rid of the larger ones first those are very beneficial and it turned out that even though we are more aggressive and trying to keep the memory below the threshold there was no not just measurable CPU overhead even though we were still a little bit of extra work we have reduced and improved the CPU cache and locality and many of the other things that did break the bias some CPU cycles so that was a very positive story on the pieces that if you actually go for them you gain a lot was making sure that the documents that have a lot of heavy graphical elements images, presentations or even in documents that have a lot of screenshots let's say or in my case what happened apparently is that Pyro was adamant to take all the images in one format which was full bodies ARGV but that was really not necessary because if your document is really ARGV and you don't have any alpha transparency then that is a waste of about 33% of your actual data and so this was a wanting stage kind of an effort first we had to get the effort to render the images in in a format that Pyro was happy with or do the conversion and then it had a patch specifically for Pyro to support 24 bit ARGV so that we don't do any conversions that are possible and in the end in memory we just want to keep ARGV so once you have an ARGV in memory 3 bytes only and the rendering engine takes that without conversion and renders them without any overheads then you are optimal memory and CPU at that point not doing an extra work not keeping any extra bytes and this was a huge win on the documents that had a lot of graphics so this is another one I call in Perl's and this one is completely just table salt not even large salt particles I hope the screen is clear alright so approaching the end of my talk and this is the bottom line this is what the effort was all about these are numbers for a wide variety of documents they are representative of at least our customers who are interested in improving the memory profile and the process strictly for the loading it was fairly straightforward we just cut the loading time of the writer in half so that was we didn't need any measurement you could just see it on the screen and here the story is that the documents that had a large ring percentage wise were benefiting from the image and graphic work where we went from 4 bytes per pixel to 3 bytes per pixel in both cases we are sitting at 30% of the line that is quite impressive because that's really much the image overhead but also you can see that some of the others like the Pythagoras one is 15% and the other one is 100% that's mostly saving on the overhead of the PDF table to buy that but also in the case of rendering in PDA here you are essentially reducing your document from being a very large number of individual elements on the screen into just a single fixed size image that is so your essentially is fixed per page of the PDF document but similarly you can see what's happening in the presentations a very similar story there and part of that has to do also with the high-roll and mismatch caching image improvements and those apparently did benefit quite a bit across the board overall about 12.5% for a representative mainly of documents for our customers so with that I'm open for questions and I realize I'm starting to think that we're running a little bit over but if anybody has any questions quickly I can answer it otherwise I am ready to talk about the line