 All right. Hi, everyone. Thank you for coming. Congrats on getting in. So my name is Ily Grigorek. I work on the Google Chrome team where, in my day-to-day job, I focus on making the platform faster. And also, we're working on the APIs and infrastructure enable developers, like yourself, build nice and fast user experiences. Although, I guess I should say fast and resilient, because as it turns out, building the fast path is not enough. And a good example of this is a pattern that I very frequently see across many different teams go something like this. The team really buys in into the vision of performance matters. They understand it. They internalize it. They have the metrics. And they do all the best practices. They optimize their app. They release it. They put processes in place. They have audits. They even gather some real user measurement data to understand how the app is performing. And then, lo and behold, they look at the data and they say, hey, we made everything fast. We did all the things you told us. But despite that, we still have these reports. When we look at our data, there's this really fast device. In fact, maybe even the device that I've been testing on locally, and it's not meeting my frame target, our 60 FPS limit. Or maybe the app is on a fast network, but it's still loading really slowly. What is going wrong? Like, I did everything right. Is there a bug somewhere in Chrome? Maybe it's the carrier. Maybe there's the phones at fault. Do you know of anything that could have caused this? And my answer most of the time is like, yeah, probably. Like, let me tell you, there's a million things that could go wrong. But I think you may be asking the wrong question. And fundamentally, the problem here is that, yes, you've made the app fast, which is great. Congratulations. You're doing all the things right. But you forgot to make it resilient. And this is a subtle shift and an important shift that I think we need to internalize and understand because our intuition fails us. Our intuition as developers often fails us in this regard. So what is resilience? If you look at the dictionary definition, it says it's ability to maintain an acceptable level of service in the face of faults and challenges to normal operation. OK, what's an example of a fault or a challenge to normal operation? There are millions of them. For example, it could be that the capacity within the carrier is limited. Maybe there's just a lot of people at one event, and there's just only so much spectrum that you have. Maybe you have an unresponsive server. So maybe your provider is overloaded. Maybe there's a network outage. Maybe the device is in a low power mode. And as you will see later, when the device is in low power mode, it runs slower. Maybe there's a background process on the user's app or on the user's phone, rather, that just kicked in. And all of a sudden, your app is not performing as well. Maybe it's under memory pressure. The point is there is a million things that can go wrong. And they span from right on the device that you're holding in your hand to somewhere in the network, maybe in the carrier network, maybe out on a public internet to maybe on your server or maybe even in your database. There's a million things that can go wrong. And there is no special carve out here for, hey, but the user is on a 4G network, or the user has the latest flagship phone. It doesn't matter what device you have or what network you're on. These things can happen. And they do happen. And that's the thing that we need to understand. So the message here is that fast is not enough. And I'll try to, I guess, motivate why that's the case. Fast devices are often slow. Fast networks are often slow. Variability in performance, and this is the critical aspect, makes things feel slow. And despite our best intents and best practices and all the things that we do to engineer great experiences, variability is really what kills the user experience because they don't get a reliable performance out of the application. But despite all these things, it is still possible to build great user experiences. And we'll see some examples of that. So there are some simple patterns that we can apply once you understand what these problems are that will help you build much, much better applications. So let's actually take a step back and kind of walk through from beginning to an end, how do we evolve to this point, and what kind of practices we have in place. In my day-to-day work, I actually get to work with many different teams, both within Google and outside of Google. And I kind of, over time, developed this three stages of performance enlightenment, as I call it, where when I go and talk to the team, I kind of see them in one of these stages. First one is performance as a bug. As in, I get to talk to them or they reach out to me because they previously did not care about it, but all of a sudden something has happened that caused it to be such a big issue that it's like, okay, let's fix it, right? Like let's add performance to our app. Okay, that's damage control. The second one is they internalize that performance is a feature just like anything else. So they start work on robustness or adding robustness to the app. And then finally, the third stage, which I'm hoping to convince you today, is this thinking about resilience and what you do have to do to get to that stage. So let's step through it one by one. Performance is a bug, right? So as I said, most of the time this is notable or performance is notable by its absence. You've accrued enough technical debt or something has gone wrong, at which point things just fell on the floor, the users are complaining, the product manager is angry and all of a sudden like there's a fire drill to fix performance. And performance is fixed after the fact in as much as you can actually fix a faulty foundation, right, and I intentionally put the Tower of Pisa as a metaphor here because if you're looking for some fun reading, I encourage you to actually go and read about some of the engineering efforts that have gone in to keep this Tower at just that angle. They can't actually like make it straight. They probably don't want it to, to be honest, right? Because it's a landmark. But they understand that despite all of their engineering, they've literally poured tens of millions of dollars, not hundreds of millions of dollars to try and keep the building upright. And despite all that work, they know that the best solutions that they come up with today will only last 50 years, right? So it's like they're just paying that technical debt and they know that this solution will still fail. And that's kind of like, that's similar to performance. It's very often the case that you can't add performance after the fact, just like you can't add security after the fact or accessibility into your application. You really have to think of it from the very early stages of your development. And that's where you get kind of that second stage. It's performance is a future. The team really understands this from beginning to an end. So if you're building an application, say, for a market where you expect lots of users with limited connectivity or poor connectivity or just very sensitive to data usage, for example, your UX designer, despite the fact that they really want to put that like really pretty and nice looking cover image on the top of your page, because it's just like, it appeals, it conforms with your brand and all the rest. They know they can't do that, right? Because it'll hurt performance. It'll hurt the actual experience of the user. And then once you get to the next stage, developers understand that they need to pay attention to performance. And finally, when you release the app, you have processes in place to kind of complete that cycle, monitor the performance, detect the fact that you may have regressed and change that or fix that. So it becomes a learning curve, right? Or it's just like a learning cycle where you're making changes, you're monitoring and you adapt over time. And this kind of sets us up on a path to what I call a robustness where you develop a collection of best practices that will help you build great applications and continue delivering great applications. And there's a long list of examples that we can talk about of like what goes into this sort of process. For example, code reviews, like the very basic form of just preventing failure, right? Like we're smart developers, but even the best of us make mistakes, silly mistakes, performance mistakes. And it's just good to have your friend or coworker check your code just to sign it and check it. That's an example of building a robustness. We then also have test suites, which detect when we regress some other component of the app that we didn't even expect to regress when we make a change in point A and something fails in B. We add redundancy into the system. So we do things like duplicate data across data centers. We have backup servers. We develop all these things such that if one component fails, the system as a whole continues to function. We add processes to actually mitigate failure. So for example, you have your spike in traffic. You don't just collapse because you've overloaded all your servers. You actually start doing things like a shedding traffic where you say, yes, I'm going to intentionally drop 10% of my traffic and they will not get a response, but the other 90% will still get a response. That is still significantly better than failing all 100%. It's a bad outcome, but it's significantly better. So you build these things into your process. And then finally, you even simulate failures. So at Google, we take down entire data centers intentionally to make sure that the applications that we run are not built to just run in one data center. And your application should be able to continue working if we just take out an entire data center. And this is a regular far drill that all teams have to go through. And of course, some of you are sitting here and you're just like mentally checking all the boxes and you're like, yeah, I'm so awesome. We've done all of this like, okay, I can pack up and I can just head out the door, right? Like as engineers, this is all the best practices that we've been taught to do. Except let's consider this case. So let's say you are building an awesome app, right? And your awesome app is in fact, pretty awesome. You have really good uptime, which you're measuring continuously. And if you look at, say, your response times for your service, you see a 250 millisecond response time for a 75 percentile and two seconds for 99th, which by all means and measures is pretty good. These are pretty tight bounds. You understand kind of the performance curve and all the rest. Now, the app is also fairly complicated. There's lots of functionality. So you have about 16 critical resources. Why 16? According to HSP Archive, the median number of critical resources on the web today for a median web page is 16. So I'm just using that as an example. So let's just go with it. Say we have 16 resources. I have a question for you. What fraction of page loads will take longer than two seconds? It's kind of a tricky question. Let's do some math. I know you didn't sign up for math, but we're gonna do some math. So we know that 1% of the requests will take greater than two seconds. That's our 99th percentile. So let's assume that the requests are actually independent and we're making 16 of them. What is the probability that all 16 requests finish in less than two seconds? And then in order to get our actual answer, the page will take more than two seconds if any one of those requests takes more than two seconds. We can crunch the numbers, you can verify the math and the answer is 15%. This is where I shouldn't start like a really, really long, one minute long dramatic pause because 15% should be pretty surprising, isn't it? Your 99th percentile is two seconds. You have 16 requests and roughly one in six or one in seven page loads will take more than two seconds. I find that very surprising. It does not fit my intuition. I have to do this math every single time just to prove myself that this is actually right. So what can we do to fix this? I have three suggestions. You should reduce the number of critical requests and then reduce them again and then reduce them again, right? Which makes sense. The fewer dependencies you have like this, the better the odds that you will not hit this case. But let's say you actually do take your page from 16 requests to five, to five critical requests. We can repeat the same math and you'll find that the fail ratio for this same test is about 5%, which means that we've taken it from one in six page loads to roughly one in 20. Definitely an improvement, but I think you'll still agree that that's pretty surprising and not very good. We should be able to do better. So one interesting takeaway here is robustness, right? Like we built a system which has very high uptime, very good latency, tight bounds, but it's not sufficient. We still have this experience where one in 20 or one in six, some significant portion of our traffic is seeing this two second load. So it turns out we actually have a lot of experience with this and Jeff Dean, who works on Google search, has given a number of really good presentations and he published a paper that I really encourage you guys to check out after this talk. You can find it, there's recordings and there's also the presentation that he's given. So he basically sets up this very problem and describes the case of a search query comes in into Google search. In order for us to search the web and give you back the results, we actually fan out that query into hundreds if not thousands of servers. So you can think of like verticals. You have images in local and we kind of dispatch those requests. And in order to compose the page and give you back that set of results, we wait for all the responses to come back and then rank them and give you back the top 10 blue links. Seems simple enough, except as you can imagine, this fan out was pretty large. So it's not just 16 requests. In the case of search, it's hundreds if not thousands of requests. And he basically demonstrates in that paper in his presentation that, say you have 100 requests, 63%. He actually sets up the same numbers of two-second 99th percentile. If you make 100 requests, more than half of your search queries will take more than two seconds, which is unacceptable. And he walks through, I think in a very compelling manner for why you cannot fundamentally fix this variability. He can certainly make your latency tells even tighter. He can spend more money on better hardware, but fundamentally you cannot eliminate it. There's virtualization. There's a million reasons why the request may take longer than what you'd expect. But he then goes on to say, but we can still fix this. We can build processes or build additional functionality into search that will fix this problem. As an example, he gives a long list, I'm gonna highlight three. He says, well, you know what? If we only search 99% of the search index in 200 milliseconds, we don't have to wait. We'll just return the results from the 99%. And in this case, the user probably won't even notice. It's not the best outcome, but it is still better, objectively better than waiting multiple seconds to give you back 100% accuracy. So that's one example, right? Just tolerate inexact results. If you're composing a homepage with many different components and many different widgets, perhaps you don't wanna wait for every widget to come back before you display the page. You can set a cutoff and say, every widget must come back in this amount of time. And if it doesn't, I will deal with it and I'll be smart about it. So for example, Amazon has this policy when they're composing their homepage. Hundreds of components, there's a bound on every request. And if one of the components is not responding, it'll just won't show up in a page. And that's fine. So they deal with it explicitly. You can abandon these slow subsystems. You can say, you know what? The spelling correction is taking really long time for whatever reason. Maybe when it's used to bug, maybe there's something wrong. We're just gonna drop that functionality. So it's dropped at runtime. And search knows that the search app and the search infrastructure knows to disable this. And further, it actually is able to adjust to this dynamically. So it's not that some engineer on your team gets paged and says, hey, the subsystem is kind of really, really slow at the moment. Do you wanna do something about it? Instead, the search system says, well, you told me what your latency target is. I'm gonna enforce it. This component does not conform to it. So I'm just gonna take it out. And we've already, before we put it into place, talked about what that means. We talked about the fail case and what it means to disable your part. So this is kind of interesting because fundamentally the message here is you have these unreliable components. Some of them will fail, some fraction of the time. We cannot fix that. But despite that, you can still build a great user experience. So this is great, right? But if you actually think about it, our problem or our problem space as well developers is even more complex because what I just described kind of looks at that green box on the left where Jeff fixated on a search query comes in into the Google infrastructure. But as developers, as what developers, we also have to deal with the other components which is the device where application is running. And then there's all this stuff in between which is the public network. So let's talk about that. Because that's another huge source of variability and performance. So today you can walk in and pick up a couple of different phones in different stores around the world, right? Perhaps around this area you can walk in and get yourself a Nexus 6P which has lots of great functionality, good great performance, lots of storage, lots of memory and all the rest. Just as likely somewhere else in the world somebody will walk into a store and pick up a device on the left which has significantly fewer resources, right? And of course some of us are sitting here thinking like, yeah, you know, it's kind of crazy that we have such a disparity but like honestly, really, more users are buying towards kind of the high end, right? Or like this stuff is an outlier, the $33 phone. And maybe you were sitting in this very room in a previous talk where Tal was talking about building for the next billion. And I think she very compellingly illustrates the problem. So she has a slide where she shows where are the new users coming from? So for example, in the past year the color coding here shows the darker cues represent the density of new users coming online, right? Anything stand out there? Well, we have India and China, right? Like as one example. Four contacts, right? Let's dive a little bit deeper. The number of new internet users that came online in India in last year is a third of the US population. It's kind of crazy, right? Like that's a lot of users. And if you drill into the data a little bit more, you'll find that in terms of where the users are still that are not online and will be coming online, it's India, China, Indonesia, Pakistan, it's all these countries. So it's not just the next billion, it's the next several billion. That's where the growth is coming from, that's where the new users are picking up their phones. And then she basically goes on and tells you about the constraints of that environment. And I think one of the constraints that you'll discover is people there are much more cost conscious. So they are in fact picking up these phones towards the other end. They're not picking up to the Nexus 6, they're picking up the other phone. And that should scare you a little bit, right? So fundamentally, we have kind of two competing trends the way I think about it. We have a trend for features and performance, that's what we hear about as developers most of the time, like every six months there is a new shiny thing with more cores, more cameras, more finger sensors, whatever, right? Smell testers, I don't know. They're gonna come up with something crazy. And on the other hand, there is a trend for price. And there, fundamentally, they're trying to optimize price. They're not, yes, they're upgrading the components. Yes, they're getting a little bit faster, but first and foremost, the question is, can I make this phone cheaper? So the way I think about it is there's an expanding range of performance. The tail or the head of the curve is accelerating because we keep adding these new things, new features and new capabilities and speed and performance. But the other end is only slowly moving forward, right? So the actual range of performance is getting wider and wider and wider, which creates some problems for us as developers. So I call this the performance inequality gap, right? The flagships will continue to accelerate, cost is critical and this range will only get further apart. So, okay, fine. Let's say we actually do have the Nexus, right? We have this great phone. It can do all kinds of amazing things. It can decode high resolution video. It can allocate lots of memory for editing video or doing anything else. It can connect to all the latest 4G infrastructure and get you great performance. It's all great, but I have a question for you. Will it? Because even though you have this device, there's a bunch of constraints that may conspire against you. So for example, on the previous slide for Nexus XP, it's advertised as a octa-core. So there's eight cores in the device. If you actually drill in a little bit further, you'll find that the architecture in these new phones is what's known as big little. So there's actually two different sets of cores. There's the big cores and the small cores. And it's not about eight cores is better than four because you get eight times or two times the performance of the four cores. It's more that we've added these big and small cores because we need to optimize for power efficiency. So it's not the CPU cycles. It's the power efficiency. It turns out that it's significantly better to run a workload on the slower clock rate because it's just so much cheaper on your energy than it is on the fast cores. So these devices actively migrate your workloads between the high cores or the big cores and the small cores. And that's a big source of variability and performance right there. So in fact, the way I kind of think about it and I think the way it's more helpful to think about this is think of the device, like say you have a dual core device, right? In those dual cores, one of those cores can be shut off. Say the phone needs to preserve energy. It can just say, look, I'm just gonna run at one core. And we'll look at this in a second. When you have a say quad core device, it'll probably have two fast cores and two slower cores. And the way I think about it is almost two separate subsystems. It's not that you're gonna be running all four cores simultaneously. That is a very effective way to turn your phone into a space heater and then into a brick, right? Yes, it can do that. And in fact, it'll do that when you run a benchmark. But that's not a sustainable long-term performance. So the device will can run fast, but more likely than not, you're gonna be seeing your workload on the fast cores that will migrate to the slow cores. And yet, if you actually compare just the slow cores, just for a second pretend that the device is in an energy saving mode or is overheating, it'll turn off the big cores. And all of a sudden this quad core is within spitting range of the one above it, right? It's like it's just slightly faster. So once again, you have this very wide dynamic range just because the device can go fast does not mean that it'll always go fast. Say I mentioned this a few times, if the user is running low on energy, Android, for example, will kick in the power savings mode. Guess what's the first thing that goes when that turns on? It's your big cores, right? And if you actually run your benchmarks or run some sort of benchmarks, I think you'll find a dramatic difference in performance. In my own studies, I see as much as 40 to 50% depending on the device, right? So that's a pretty big range. If the device is overheating, it's a sunny day, you have your phone on a dashboard, guess what? You're not running on a big cores. Kind of silly, but that's what it is, right? Maybe there's a background process. Maybe there's something else to do with consuming the CPU cycle. So there's so many variables you can't control. So expect variability, right? So some of these reports that we started with at the beginning, it's like, hey, I have this fast device, but sometimes it's not getting me the FPS that I expect. It's like, yeah, probably, because there's just things in the world that you have to anticipate that cannot guarantee consistent performance that you get when you put your phone on a desk and you just benchmark it right there with AC on, right? So that's CPU, that's just one example. This same argument applies to memory and, of course, networks, right? Has anybody had a fast or sorry, slow 4G connection, right? Is 4G always fast for you? No, no, right? It's like, just because you have a fast network, it does not mean that you're gonna go fast all the time. There's nothing that guarantees in 4G that you'll go fast. Yes, there's much higher peak performance, and if you're lucky in a good area and with good coverage and there's not enough people and the network weather is nice, and if you're looking right at the tower, you're gonna get really good performance, but otherwise, expect failures. And further, the other kind of interesting and surprising fact is how often the users are actually offline. Sometimes not permanently offline, or in the sense that it actually says I'm offline, but it says I have signal, but effectively, I'm offline. It's the li-fi example, right? I dug up some data in Chrome. So in Chrome, we have telemetry to track how often does the page fail to load? And this number, or these numbers, the first time I saw them were shocking to me, right? Because effectively, what I'm showing you here is stats for failed navigations. This is main navigations. This is user typing in the URL or clicking on a link, trying to open, say, your website, and that navigation fails. This is interesting data because this is the type of request that never even makes to your servers. You don't even know about it, right? Which kind of begs the question of like, if no server was there to hear the request, the request really happened, right? It's like, yeah, it did, right? Because we've all experienced this. And I am showing you ranges here. So what does it mean to be, say, two to 5% failures at a certain percentile? And it turns out the rates do vary between different countries and different continents. So there's just better connectivity in some places. At which point, the first time I saw this graph, I'm like, ah, you know, this doesn't really line up with my experience. Like, maybe it's the problem in those other countries, with the bad connectivity countries, sort of thing. So I started digging a little bit further. So sometime later, I was actually reading a report. So this is coming from OpenSignal. So they published these quarterly reports on performance of LTE in different regions in the world. And there's this really interesting gem in there. Look at the bottom. In Germany, Italy, France, and UK, the chances that a 4G subscriber will connect to an LTE network are a little better than a coin flip. Why? That should be surprising. It was certainly surprising to me. You expect that North America, Europe should have this all covered. Turns out that's not the case. So I started digging a little bit deeper. You can actually go into the OpenSignal data. I'm like, OK, fine. So let me compare, say, Vodafone. I'm not picking on Vodafone. It's just it was a convenient example that I used. So I look at the stats for Vodafone India and Vodafone UK. So let's compare them. Roughly the same. You can see that, for example, the 4G performance in terms of the download, the peak download and upload speeds is definitely better in the UK. Interesting. The latencies are about the same. But then the thing that really stood out to me is this section at the bottom. So the way OpenSignal defines this is they run an app on your phone, which periodically just pings data to their servers. It tries to connect to a server just to see, am I connected? And if I'm connected, maybe I'll run a test just to see how fast I can upload or download data. And this number, the reliability number, is showing how frequently they were actually able to make the connection. So the phone thinks it's connected. It's showing you that bar in the top right of your phone. But the connection fails very frequently. So for example, look at 2G. In Vodafone, say, India, 75% reliability, which means that there is one in four chance that even though you have the 2G symbol on your phone, the request will just fail. And this is the experience where the user clicks and it's just sitting there. It literally even takes a couple of minutes to get to the Chrome dinosaur, which is not a very happy ending. So it's like, OK, this is not looking very good. So I started digging a little bit deeper. It turns out OpenSignal also has kind of a deep dive into UK in particular. And they share some of the numbers in terms of the coverage specifically for Vodafone users and how much time they spend on each network. So for example, if you're a Vodafone user in UK, how much time do you spend on 4G? How much time do you end up spending on 3G? And then we have the reliability data. So you multiply those out and you say, OK. So if I'm on 4G, which is roughly 40% of the time, and I have 97%-ish reliability, about 1% of time, even though I have the 4G signal, I can't connect. If I'm on a mix of 3G, 4G switching between those networks, it's about 4%. And then finally on 2G, it's about 15%. So effectively, that's my effective offline time. And then there's the actual offline time, like the phone knows it has no signal and you just can't connect, which is not 5%. And if you look at these numbers, they line up pretty well with the failed navigations that I showed you earlier, which I think is shocking. It was certainly shocking to me when I first saw this data and I was able to line them up and say, OK, this is a much bigger problem than I thought. You can repeat this test for North America and you'll find very similar results. There's definitely a range of performance across different countries, but they're roughly in the same ballpark. So there you have it. It's not just those markets, not those emerging markets where this problem is happening. So one takeaway here is offline is not an exception. We keep thinking about it and treating it as like this is an exceptional state. The user is in San Francisco. They should never be offline. As somebody that lives in San Francisco, I can tell you that's definitely not true because I spent a lot of time offline. It's surprising the places where I'm offline. When I go to Golden Park, I'm offline. It's kind of remarkable and scary. So we do have, I think, a better analogy that we can use here. We develop UX personas, right? Like when you develop a product, you try to understand how the user is interacting with the application, what their needs are. And I think you can actually use that persona to also derive some performance insights. So as an example, right? You have two different people with two different needs. One is an urbanite who lives in a big urban city. They have pretty good 4G coverage, as you would expect, right? High density, good 4G. But it is high density, so they actually find themselves quite frequently on 3G because there's just not enough capacity. Surprisingly, they spend a lot of time offline. And why? Oftentimes, one of the most common culprits is public transit. You go on a subway, you go somewhere else, and I think every and each one of us has experienced this. Sometimes it just does not work. So they actually spend a significant time offline, effectively offline. Compare that to somebody who's living in, say, more rural area. OK, the 4G coverage is not there yet. The density is not high enough. But they spend a lot of time on 3G. And then in the worst case, they fall back to 2G. And occasionally, they're actually offline. So they may be slow, but they're not permanently offline. There's an interesting message here, which is there is no such thing as a 4G user. You got your 4G plan. Somebody sold you the 4G plan, but you're not spending all of your time on 4G. You're migrating between different networks. You're actively migrating between different cells. And hence, the performance variability. And I think we really need to think about that and understand that not only is offline just not an exception, it's a normal state for these networks. But there's a range of performance, and we need to design for it. So we can't just say, oh, yeah, my app is for urbanites. They typically have 4G that live in North America, so everything's going to be nice and fast. Nope, definitely not the case. So coming back to what is resilience, right? I claim that resilient applications must account for the growing dynamic range of performance. They must account for the increase in the actual performance inequality between these devices, and also just the range. And we saw a couple of examples. There's the latency and the variability. There's the CPU. There is the memory. There is the network, right? Between those, all of those combined, that's why you're seeing reports like, why is this user on a 4G network on a fast device having a really bad time trying to load my applications? It's like, well, I don't know. Maybe they're a low energy, and they enable data saving, and they happen to be roaming. That's a very valid scenario, which many of our applications fail today. So performance, I claim, is a combination of multiple things, right? You have to be robust. Performance is a future. You build a whole lot of processes to make sure that you don't regress, that you're able to catch these sorts of things, and you're able to deal with failure. But then also, it's a responsiveness. It's being able to adapt to the current situation, understanding what the capabilities of the device are. And that is what gives you the resilience, where you're able to say, yes, this device smells like a fast thing. The network sounds like it should be fast, but it's not, right? And that's not a problem with the network. I'm like, it is a problem with the network, but it is my problem, and I should be doing something about it. I can't just say, the network is bad, so hence, it's out of my control. So I claim that there are some axioms of what it means to be a resilient application. Performance is not a static characteristic. It's not something that you just say, I curved out the fast path in my application. I eliminated all the bad things. That's not sufficient, because of all the variability. Fast devices actually are very slow, very frequently. You'd be surprised if you actually look at your telemetry. That is probably why you're seeing those long tails in all of your real user measurement data. And this range is only going to increase, so I think this problem will only get worse. You'll find yourself sitting a year later and saying, look, we're doing all this work to optimize the fast path, but which the tails are getting even longer. What is happening here? Devices migrate between different networks, specifically for mobile connectivity. In offline, I keep drilling this. It's not an exception. And I think you've heard many other speakers in the mobile track talk about this. We need to build mobile or offline first applications, because it is a normal state in that sense. So how do we put this into practice? I think once you understand these constraints, it's actually not that hard to change how you architect your applications. There are subtle things that you can do. First of all, once again, offline. It's a norm. We should be able to provide consistent user experience regardless of the type of connectivity. So today, we make the request, we cross our fingers, and we hope that the thing comes back quickly. That's not good enough. We should be able to control that. And that means that we need to be able to render something regardless of whether you're in a fast, slow, or an offline state. And if you accept that and you're able to render something and then say, OK, I'm now making the request, it's the app shell model. You can load the Chrome and say, waiting, like I'm doing something. But it's already significantly better than the user staring at a blank screen or worse ending up looking at the dinosaur. So that's your baseline. Because with that in place, you can do other things. How do you get there with Service Worker? That's a whole topic on its own. We don't have time. But thankfully, Jake is covering this in detail. He's talking later today at 4 PM. So I really, really encourage you to go check out that talk. And he goes into all kinds of detail for what it means to actually build this, and the best practices, and the gotchas, and everything else. But say you do get the Service Worker installed, right? The first thing that you will do, or I suggest that you do, is once you have this thing installed, you should take control of every request that is flowing through your application. Every request should have a bound. They should not be unbounded. I'm not sure when this will come back. So as an example, we intercept. In this example, we're using Service Worker. And we're going to intercept each request. And we're going to pass it along to the network, just as I did before. But at the same time, we're going to set up a race with a timer that just says, I'm going to wait for up to one second. And if the timer fires before the response comes back, I'm going to deal with it. I will fail. This is the example of, hey, the service is slow, and I need to change how my application behaves. Maybe it's simply providing feedback to the user, saying that, hey, I'm still waiting. Do you want me to continue or something else? Or maybe you just take it out entirely. In this example, I'm just returning a request timeout with a 408 status code. And maybe your upstream application knows how to deal with that. That's one example. And note that once you actually have this in place, once you have this race, offline is just an optimization to this race. Because if you know that the user is offline, there's no reason to run the race. You just say, well, I know the request is not going to succeed, so offline is no longer an exception. It's just a natural state. Just like you take entire data centers offline and it's assumed that you'll have to deal with it, here it's assumed that requests will fail and you have to deal with it. Once you've done that, now we've controlled some of the latency tails on the network and we're able to provide consistent experience, we need to actually adapt how the application runs when we bootstrap it or start running it on the device. For example, how many cores does the device have? You can access that. There are APIs for that. What is the battery status of the device? You can query that too. You can actually know if the device is being currently charged, if it's in a low power state, or if it's being discharged. You can access that. You can figure out if the user has enabled a data saver option in the browser. You can check what type of network the device is currently on. You can also find out all kinds of all the properties at runtime, like what's the resolution of the screen? What's the device pixel ratio? And maybe you can adjust how your app behaves. As an example, coming back to our request flow, you can augment it with all kinds of interesting logic where you can say, look, I'm going to intercept the request, but the user is telling me that they want to save data. So maybe instead of fetching a really high resolution image for this product photo or something else, I'll fetch a lower resolution because that's what the user is requesting. If the user is on a slow connection, like something like a 2G or 3G, you know you're not going to get the high download performance. So similarly, you can trigger the same logic and say, like, you know, if I'm a mapping application, maybe I'll download fewer places or lower resolution tiles because it's important that I get a consistent experience back, not that I get a high fidelity experience back in 10 seconds. That's a bad experience. Same thing for low battery and all the rest. You can imagine using the same sort of logic to adjust your runtime, where you can actually say, I'm trying to drive this thing at 60 frames per second, but for whatever reason, the device is in the low power mode or something else, I can't do that. So I'll need to change that. And this is, I think, the crucial part. You have to monitor this. So it's not just that you bootstrap your application, you detect these things, and you somehow initialize the app, and then you just cross your fingers to hope for the best. You monitor this. You've made certain assumptions. You know that what the network is theoretically capable of, but then you start observing. Every request that goes through, we expose data about how long it took. How much data did you download? You can use that to augment your previous assumptions and say, you know what, I'm on 4G, but this 4G is just not what I thought 4G should be. So I'm going to downgrade the experience. I'm going to disable this feature. Or maybe I can't drive this thing at 60 frames per second. So I will disable some other features. This is not new to many game developers. They continuously monitor their Raff loop and say, OK, well, I can't render these high-resolution sprites. I'm just going to drop some of them, because that's fundamentally better experience than delivering a janky experience to the user, which will actually make them nauseous. It's not that far out, and it's pretty simple. So you have to use these signals continuously and adjust a runtime. And that way, when the user transitions from 16% and they're running on their big cores on their Nexus 6 or some other device, and all of a sudden they stop over the 15% line and the PowerSaver kicks in, your application is able to detect that fairly quickly and adapt to it. And the experience is much better. Same thing for the network. So as I said, these things are not fundamentally hard. They're just subtle shifts. And these are the four things that I hope you will take away to help you build faster, more resilient applications, because just carving out the fast path is not enough. So that, thank you. I'm going to be hanging out at the mobile web booth for the rest of the day and tomorrow. So if you guys have any questions, please come on by and ask me and also Twitter or Google Plus. Thank you.