 So, for done performance, what is this? You've heard an hour ago about Firefox Quantum. Quantum changed completely how the back end of Firefox works. We have a much faster engine. And the problem we had is nobody's going to notice because everybody knows that Firefox is slow. Even though it's... I think I will just unplug this. Everybody knows that Firefox is slow and we need people to try Firefox again. So the way to get them to try Firefox again is to, on one hand, change the appearance. If it looks new, people will try it again and also ensure that it's not slow because you could make displaying web pages as fast as you want. If there's something slow in the UI that you are using to interact with the web pages, it will still feel slow. So this is what done performance was about and we were a very small team. There were two and a half engineers and we had six months to do this. So here's what I'm going to cover today. First, I will give a UX perspective and we think about performance. We probably think about the designer's point of view which was pretty interesting in this case. Then I'm going to say what we did as engineers how we approached the problem. I'm going to give examples of what we actually did and then talk a little bit about future projects and what we learned out of this project. So, UX perspective. It turns out we had a designer in our team which was kind of surprising but also very nice for a performance project. It also turned out that I was in the same room as that designer when we started the project and we had early discussions about how we should approach this and the designer told me, oh, so we need to make Firefox feel faster and there's nothing we can do to make it actually faster because otherwise engineers would have already done it. So we need to cheat, right? It was kind of a surprise for me because for me improving performance was all about making Firefox actually faster and turns out it was not. So, from the designer's perspective what was important is to be faster measured by clocks but especially by human perceptions because performance is really a perception. It's really something subjective and our designers studied this a lot. They did lots of research including benchmarking things with real humans in front of various versions of browsers to compare and see how users would feel whether they would prefer to modify three most important things. One is responsiveness. The other one is fluency and duration. So responsiveness is making things react as quickly as possible when the user interacts and by as quickly as possible the designer just tell immediately whatever that means. But there's plenty of things you can do in the browser and obviously we couldn't make everything be immediate. We also had to make... So fluency is another one. It's making... It looks like the machine is not working hard. So this is, for example, while you are scrolling on a page if it stops somehow and then starts again it will look like something is really working hard in the background and that your machine is not performing well or the browser is slow and we really want to avoid that. And duration is just making it take less time. It's not for everything, it's just important things. So we had to identify what matters what were the important things. And actually because this was ideas from a designer we didn't have to make the things what it takes less time. They just had to seem like they were taking less time. So, for example, if you are loading a web page it doesn't matter how long it takes to load the web page. What matters is how much time is there between when you click to load the web page and when the web page is on screen. So, for example, what if we started loading something over the network when you press your mouse button? You've not released yet, so it's not really a click but by the time you've released maybe we have resolved the DNS. So we're not clicking any information. So that's okay to do. And that's how we can maybe save 100 milliseconds which is the average time it takes for someone to press and release the mouse button. So this is performance from a designer's point of view and that was really interesting because that's really not how I was thinking about it before. And another thing that I wanted to go back to I said the designer wants us to do things immediately. As an engineer when I'm told it needs to be immediate I'm thinking like 1 millisecond or maybe less. But that's not what designers have in mind. It's the limit of perception. It means the user will not notice if you do it any faster. So we had lots of experiments for this and it was also very interesting. We discovered a few things. The limit of perceptions that is if you optimize further we won't see a difference. It's about 100 milliseconds. So if clicking on something and the UI appears as a reaction takes 100 milliseconds or 50 milliseconds you will see absolutely no difference and feel no difference. If it takes half a second you will feel that you're waiting. But most interesting to me was that if it takes 200 milliseconds you will not be waiting. You will not see any difference compared to 100 milliseconds. You will just feel bad about it. And if you're given one product that takes 100 milliseconds and one that takes 200 you will not be able to say what's the difference between the two. But you will be able to say which one you prefer. So this is the kind of things we care about when we say Firefox needs to feel faster. It's a feeling. This one is also about designers. They consider as performance the fact that if you've got to do three clicks instead of one it's slower because they consider the performance of the whole interaction not just the thing that the browser actually does. And yes, performance is subjective and they have lots of interesting things written about it. So if you want more details about the designer's perspective it's all published. I'm not talking about engineering because this is what I worked on the most. There's been lots of performance projects at Mozilla. Over the last 10 years I've heard about every year someone working on performance fixing pair of bugs and engineers are really happy when they fix something. It's not 10% faster, awesome. But the problem is it's not staying fast. It's going to regress. And we don't even know why. It's not because we don't care. We care a lot. We've got lots of continuous integration tests, performance tests but it still regresses. And we'd like to understand why and stop it. So we spent a lot of energy on this project trying to fix this. And the real approachable problem is instead of quickly finding stuff where we could find 5 or 10% wins we instead spent a lot of time analyzing what was slow in Firefox, why it was slow, when there was patterns and what we could do to stop it. And by stop it I mean I'm sure that we notice not by seeing it slightly slower but by backing out the patch immediately because it's a breaking test. So we spent a lot of time in this project trying new kind of tests that will just cause very small performance regression that would always go unnoticed to be noticed immediately and backed out. One example of a performance project that we worked on is startup performance which is how long it takes when you start Firefox before you can start using it. I spent a lot of time working on this project and it was pretty difficult for various reasons. Most engineers who put code in the startup path think that it's pretty needed that they need these stuff to be there at startup and sometimes we have to argue about it. There's also a lot that's going on at startup like IO but we do need a lot of files from the disk to be able to start from browser. IO is sometimes done on the main thread. Like we said an hour ago, we want everything to be done off the main thread so that the main thread is responsive but there are still some IO done on the main thread at startup. Last but not least, our tooling was really not great to work on startup. You probably heard a couple of hours ago about DevTools and especially the Profiler. Profiler is a great tool to identify problems in performance. But in startup, you need to have the Profiler started first before the rest of the browser which is something we could do with another one variable. Then we had to fix it so that it would actually work but that's fine, we did. And then the Profiler tells you where you need to focus your attention. The parts where we are unresponsive, those are the parts that are displayed in red. So this is what startup looks like and more or less everything is red in here. So we had to figure out where to put our attention. So this is about 20 seconds before the user can start using Firefox on this profile which is pretty bad. This was also a very slow machine because slow machines are good to magnify problems. So we had to break down startup into various phases. So the most important one is when we start showing something on screen is when we react to the user requesting that we start the browser because if the user is waiting for 10 seconds with absolutely nothing on screen they will start to wonder have I really clicked this icon and they will click it again and again and again and when it finally starts we've got five Firefox windows. So, yeah, the first important step is when we reach first paint and then the last step is when we start interacting with the user's event. So the 20 seconds here is when we are ready to interact with the user but first paint is most important and we reach first paint when we've got the green thing in here so I can zoom in and this is the profile until first paint and there's still a lot of stuff going on. There's some stuff in here that's quite okay. This is only C++ code. So this is filtered to have only Java script code. This is C++ stuff. This is stuff that's pretty okay. All of this is the admin manager starting, especially starting web extensions. This is telemetry. We collect a lot of telemetry data at startup and we start some stuff like the graphic drivers and things like that at startup to collect information about which graphic cards we had just in case we crashed so that we could record it. That's also pretty slow. Here we are starting to process the command line parameters. Here we start loading the browser window. Here it's about loaded from the disk. Java script starts executing in it and here we're ready to show something on screen. All of this is pretty slow and we can still do better. Another thing that was hard about startup is it's hard to measure. We've got lots of performance tests that say, well, startup takes this time, but sometimes we've got surprises. So this is going down. Looks nice, right? We're improving stuff. And here it goes up so that sounds like regression, but actually it was not. We discovered later that it was just the test being broken. This happened when we removed a very large image that was loaded for no reason during startup because it was not going to be shown. And, boom, the test regressed and realized that the test was measuring the load event and it was actually measuring until the first time we are downloading an image, even if it was a useless image. So here we back out that change because we thought it had regressed and here we landed the same change but changed the test so that it would wait until we actually have loaded the browser UI. So those are some prizes we had, but it was pretty common that we fixed something but backed out because someone thought it had regressed and it was just the test completely broken. So we switched the testing for performance to something else. This was based on screen recording. This was the quantum release criteria. So this was real humans who were recording Firefox starting on a computer, doing it 10 times and then looking at the video and checking how many frames. So each frame takes 16 milliseconds if we have 60 frames per second. So counting how many frames and checking how it evolved. And this was pretty disappointing, pretty noisy so it didn't help us much. There's one point in here that within the target looks like success, right? Except no, it means we were just comparing with Chrome and Chrome was three times slower that day for no specific reason. So I think we can just forget about this one. So this is why we introduced different kinds of tests. I talked about the various phases of startup we had. We put in place tests that identify those various phases. And if something unexpected is happening in those phases, we just get it backed out. So this is just the testing we put in place. The things we also actually fixed. We delayed lots of things. For example, initializing NSS. NSS is our encryption library. It's taking plenty of things from the disk and synchronously, unfortunately. It will be another few months before it's done synchronously. We managed to get that delayed until after our first paint. Same for the places database. This is where we store the bookmarks and history. We really don't need this before we have a window on screen. The search service also. You are not going to search anything on the web before you have a window on screen. So plenty of things we could delay. Also plenty of things that were very tiny and the developer thought, oh, it's less than one millisecond. It doesn't matter. But if you get 30 of those, it starts to have an impact. So we moved plenty of those things away later and made some lazy and got some nice improvement. So the result we cut in half the startup time on telemetry or test from real users. But another way to see it is I said using slow hardware is very nice because you see better of a problem about performance. This is a very old net book that I found in the basement of my co-working space. Warm startup on it with Firefox 55 was taking 14 seconds. Firefox 57 starts in 15 seconds on it. This is the difference between a machine that's completely unusable and stays in the basement and now this machine is usable again. Just painful. But we will continue fixing this. So I will go slightly faster for the rest. This is something that can also happen on the web. Sometimes when the user interface is not smooth it's due to something we call synchronous layout flushes. So whenever you do changes to the DOM the machine will be processing this again to decide what are the changes doing, what are the new sizes of various boxes you've got on screen. But sometimes you've got JavaScript that changes something on the DOM and then takes some measurement like get bonding rights of something and that forces the layout machine to compute immediately everything. So if you change something on the DOM and then take some measurement and then change something else and do it again you're in for something that's going to be very slow. That's something that will happen when we are typing in the address bar. The auto-completion panel there was some of this happening for each of the rows in there that was very slow. So we used the same approach for this which is identify all the problems. For this we wrote an addon that we ask lots of people in the community to install and then report bugs from it. So this addon was sending us the stacks whenever there was synchronous reflows happening in the UI. We got like 200 bugs filed out of this. And then we thought about how do we fix the problem so that it doesn't come back? So one thing is we introduced this thing which gives us a way to query layout information so take measurements at a time when we are sure that it's not going to cause any computation so that it's always free. And then do changes on the DOM only in request animation frames so that again it's never going to cost anything. And this has a very big impact on the performance of the interface. This has already been mentioned a while ago, synchronous IPC so communication between content processes and the main process. This is also very slow and a lot of those were eliminated. Timers are also a problem. This is when using setTimeout in JavaScript code. And that's a problem because timers have a specific time when they need to execute. And if for some reason they can't execute by that time they will be executing right the next time they get done. Which means that if you've got some slow code that runs for 500 milliseconds and you've got plenty of timers that want it to run, they will all run at once and block the UI for even longer. This also happens when you put your machine out of sleep. All the timers that were supposed to execute while the machine was sleeping, they all happen at once. So we also need to eliminate all of those. And then there's the use of the things we improved. Speculative connections when clicking something that will cause an opening of the socket, negotiating SSL and things like that. This can save 100 milliseconds on any page load if we do it well. Also we took the opportunity of no longer supporting legacy add-ons to rewrite some large parts of our code that was slow and just ugly. Like we had this task.gsm library file that was used to do async JavaScript functions. And now that we natively support async JavaScript functions, we could just write the whole thing. This was kind of a crazy change. 40,000 lines changed at once. And it saved about 1%. Which is something now we can do. It's okay to do it. And we are going to do it as often as we can. It's obvious it's going to save time and make the code much nicer to read. And then when we're almost out of time, which is the case for this talk, I was told it's too late to do any change related to performance because performance changes do regressions and I was not allowed to work anymore. So what we did there is we focused on polish bugs and especially flickering. Because even if the thing is super fast, if there's something that appears here and then moves to here and moves again, it feels super slow. And Firefox 55 when you start and open a browser window, you've got things that appear in some place, then neutral bar icons that appear, and some more, and it causes everything and especially the location bar to shift. Which means that visually everything looks broken for a split second. And we fixed most of those problems and the few that remain are in the process I forgot a little bit. A few seconds to talk about what we're doing next. Switching tabs much faster. We are cheating on this with what we call tab warming. As soon as you hover tab, we are starting to preload how it will display this tab so that when you actually click, it's switching immediately. Closing tabs faster also. Starting faster, I said the first time we paint the browser on screen is important for startup. Something we're going to do soon is display a blank window as soon as we can before running any of the other JavaScript that I was showing on the profiler and then load the actual browser later. So those are all things that are in the process of being done right now. And other things we could do is, for example, improving session restore. It was kind of a surprise for me. Some people were very, very happy about how fast Firefox 57 was because now session restore was fast. And by fast, it meant it used to take three minutes, and now it takes 15 seconds. And when they tell me that the browser is fast because it's frozen for only 15 seconds, it's kind of strange for me. I think we can do much better. If you have other ideas of things that we should explore, let me know. I'm very interested, and we will keep working on performance for foreseeable future. And some general lessons we've learned out of this project. Engineers don't like to be responsible for the slow code. So if you can prove that something is slow and when, and you can file a bug and see see that person, sometimes the bug gets fixed surprisingly quickly. I spent weeks profiling every day, filing bugs every day, and it was very common that within hours my bugs were fixed, which is not the typical experience when filing bugs. So it was pretty nice to experience. If something is hard to test, it's going to regress. So stop what you're doing, and think again about how you can test it. And if you don't find it, ask your colleagues and all the things that we did for this project we put in place automated tests so that it's never going to regress. And I think that's the most important thing we did as part of this project. We had only six months on two people who were supposed to fix all the performance problems and we decided to spend half the time introducing tests. And I don't regret doing this because I think this is what will be having the most lasting impact in what we do. Designers have different ideas. We should really talk to them. It's interesting. It's like this netbook from the basement that was wonderful. And we actually ordered slow hardware on purpose so that it would match what our users experience. So lots of testing was done on slow hardware on purpose because developer hardware is useless for performance testing. But the netbook is really another magnitude slower. There's plenty of good ideas on bugzilla. We ended up fixing bugs that have been filed eight years ago. Yeah, I talked to UX. Yeah, if it's easy to detect problems, people will notify you when they find problems. So yeah, create tools that make it easy to detect problems and file bugs. And if you've got more questions, we probably have one or two minutes? Two minutes. Yes? I heard only a third of the question, but my guess is all the restrictions related to the recent security measures are going to impact the performance work we did. The short answer is no because we focused on stuff that was really slow and really broken. We must be able to feel that Firefox is faster due to having things happening on different processes. And we mostly focused on time on fixing cases where the main thread, which is the UI, was completely blocked on stuff that really we should not be doing. Yes? Should I repeat? Yes, it will have an impact, but not on this part of the work, which is what I was saying, not on the UI. Yes? The bits of the question I heard were about network scheduling and how many connections we do. This is not something that works on a lot of applications. This is something that matters more if we want to restore large sessions, but currently I think we'll only restore the tabs that are selected, so not that many requests. I think it matters more when loading web pages, and this project was more focused on the user interface. Okay, so as a conclusion, we are just getting started. This was a crazy 6-1 project, and there was enough results that our managers decided we should keep doing this. So I've been doing this for 10 months already, and I've declared this point that I will be doing this for all of 2018, and we've got more people also. We were 2.5 engineers, now we are between 4 and 6, and our goal with this project was not to make Firefox slightly faster, we wanted it to be fast, but for good. It should never regress again, and it's interesting because our marketing people just actually said this, the new Firefox is fast for good, and that was really what we tried to do.