 Now, what is JIRA Cloud 68845? It is a ticket on the JIRA for JIRA, and our public issue tracker. And it's a relatively active instance, but this ticket is special for two reasons. First reason, it is relatively active. There's a lot of stuff going on there. And the second reason is, because it is so active, this guy chimed in. Scott Farqua, our co-founder and co-CEO. And don't get me wrong, I know Scott for many, many years. Scott is a lovely guy. But if there's one thing that Scott is passionate about, then it is the customer. So if the customer is unhappy, so is Scott. So you can imagine that JIRA Cloud 68845 was a relatively big thing for the JIRA engineering team. If you would wake up a JIRA engineer at the middle of the night and say, they would say, JIRA 68845. So it soon became known as the ticket in my team. And as the engineering manager responsible for cloud frontend performance, it was clearly my job to kind of rally the forces and get this problem sorted. So this talk could just as well be called Closing JIRA Cloud 68845. So what I want to do today is share a little bit of this story. Share some of the lessons that we learned. And I hope there's something interesting in there for you. So the first thing that we did, and the thing that should stand at the beginning of every engineering project, is to measure and to understand. And measuring performance shouldn't be all that complicated. All you need to do is identify a couple of key interactions that could be loading a page, opening a dialogue, or moving a card from A to B in JIRA. And measure them, right? Measure from the moment where the user starts an interaction to where the interaction finished and just ready for the user. So as an engineering manager, I'm often interested in the big, big cha, right? So we have all these data points of our interactions, hundreds of thousands of millions. But I just want to ask a very simple question. I want a very simple answer. And that's how fast is JIRA? So the metric that we are using in this lesson is called AppDex. And that's a standard. It stands for Application Performance Index. And the way it works is that we categorize all our millions of data points into three categories. The first one is for all interactions that take less than a second. And we consider them as good interactions, right? They are fast developers like them, sorry, users like them. And we reward ourselves by giving ourselves a point. Great. The second category is between one second and four seconds. And we say this is kind of an OK interaction, right? It's not great, but it's also not terrible. So we reward ourselves with half a point. And the third category is everything that takes more than four seconds. And that's actually, that's pretty shit, right? So we don't deserve any point for that. So that's zero points. The next step is you average over all the points that you're getting. You multiply by 100. And ta-da, here's your AppDex value. And then AppDex value is a number between one and 100. And every application has one. Your application has one. My application has one. Great. This could be, you know, 64, 70, 78, whatever. Now, what does this mean in practice? What does AppDex 70 mean? I couldn't mean a couple of things. For example, it can mean that out of 10 users, seven have a great experience and three have a shit experience. But it could also mean that out of 10 users, four have a great experience and six have an OK experience. So AppDex value, you can see, is a high-level value. But it is not particularly good for going deep, right? Gives you a good metric across your entire application. But if you actually want to understand what's going on, if you actually want to make a change, you need to drill a bit further. You need to go down to the milliseconds, down to the percentiles, to the B50, B90, B95. Now, what's a percentile? Well, if you take all the data points for an interaction and group them by how long they take, you get this nice distribution, right? And the B50 is the value where 50% of all data are worse and 50% of all data are better. The B90 is the point where 90% of all data are better and 10% of all data are worse. Now we're talking, right? This is a good level of detail. But we can break this down further. We can break this down by a sequence step, right? How long does it take on the server? How long does it take to download the HTML? How long does it take to download and execute the JavaScript, et cetera? We can break it down. Sorry for that. Where am I? Break it down by user characteristic. How big is the JIRA instance? What browser is the user using? Where is the user based? And finally, we can distinguish between initial loads and SBA transitions. And I'll talk about that in a second. And the most important thing really is, and I can't stress this enough, we want to measure in production, right? We are all developers. And as developers, we're actually pretty privileged, right? Typically operate on high-end hardware. This is a $4,000 MacBook. We have typically a pretty good internet connection. And in many cases, we are relatively close to the data center. If I'm working out of our Atlassian office in Mountain View, I'm sitting pretty much next door to the AWS data center. Our users, well, we don't know. But typically, they are less privileged than we are when it comes to resources. So I can't stress this enough. When we measure performance, we want to do so in production. So how does this all tie to the ticket? Well, we instrumented and measured JIRA quite extensively. And we discovered a couple of things. The first thing is that the thing that is slow in JIRA are page loads, right? Opening a dialogue, that's OK. Moving a card around, that's OK. But loading pages, that's the thing that's slow. The second thing that we discovered is that this applies to all pages. It's not that one page is particularly bad. This is a systematic problem. And the last thing that's quite interesting is we discovered that roughly half of our traffic is coming from internal, if you want. So it's users navigating from one JIRA page to another JIRA page. And the other half is external traffic. So that's somebody clicking a link from an email or from Slack or entering a URL to the browser. So let's sum this up. First thing, page loads are slow across all pages of JIRA. And we have roughly 50-50 split between traffic that is coming from internal and external traffic. So what does this mean? How can we use this to improve performance? And when it comes to performance, I always think of this one tweet by kensidots. And let me zoom in a little bit. What it says is, let me tell you the secret about app performance. It is less code. That makes sense, right? Less code means less to download, means less to execute, means faster. Great. So let's look at this diagram again. And maybe focus on the first half. Let's focus on the 50% internal traffic. Now, how does this look in practice? Well, one example is a user operating on their JIRA board and then navigating to diversion management. Now, if we look at these two pages, they actually look kind of similar, right? They both have the same sidebar. They both have the same menu. So what if we could maybe only load the content of the page and not the bars that are in common? And you'll see what this is getting to do. I'm talking about single page applications. And the concept is really simple, right? Instead of breaking our application down in a couple of completely independent pages that all need to be loaded and unloaded and loaded individually, we structure it a little bit differently and we break our application down in the bar that is common to all our experiences, the SBA shell, and the actual content of these various experiences. We can then just replace the content and keep the SBA shell loaded. This does not only feel faster, it is faster. Why? Less code. And React component architecture actually makes it quite easy to build SBAs. They're great libraries, right? We can use React Router. We can reuse React loadable. There are a couple of really good bundlers like Webpack and Basel that do the code splitting for us. All nice. So it's all easy if you're doing a green field project. But we are really talking a transition project here. Talking a modernization project. That makes things a little bit more complicated. The first thing I would like to talk about is related to how the legacy code base and the modern code base coexist. At Atlassian, we often talk about moving from an inside out model to an outside in model. Now, what does that mean? Well, when you start a modernization project, you start with legacy code. That's why it's a modernization project. And then a very, very natural way to start the modernization is to take one component out of your page and replace it by a modern component, replace it by a React component. And you have this sea of legacy code with some islands of modern code. And these islands grow and grow and grow and grow. And that's all good. React supports this better. And this is a very reasonable way to start a transition. But at some point, you get to this point where it becomes a little bit awkward. The islands can start bumping against each other. And everything becomes a little bit weird and strange. And that's a really good indication that you want to turn this equation around and switch to an outside in model, where your application is fundamentally modern. React owns the routing. React owns the page layout. But it may still have some islands of legacy code that are now getting smaller and smaller and smaller. So one way to look at it is the complexity curve. The one starts really easy. You just need to write a component. But it gets more complex over time. But the other one is pretty hard at the beginning. You need to make very important, very fundamental decisions early on, but it gets easier over time. So the big challenge here is to find that sweet spot where you can transition from one to the other. Now, how does this relate to performance? Well, in many cases, your legacy code base just is not an SBA. So in order to transform your application into an SBA, you really need to get to the outside in model, which is a lot of work. The second thing that I would like to talk about is the network effect of SBAs. That's an interesting one. So I think we have established that when transitioning between batches, an SBA is faster than a non-SBA. But for the initial batch load, SBAs are often a little bit more heavy than non-SBAs. So you pay a little bit of a price at the initial batch load. So then this is where the network effect comes in. And we know the network effect from phone. So if you are the very first person on this planet who owns a phone, this is a pretty shit position to be in. Because you need to carry around this phone, but you can't call anyone. It only gets interesting when your friend also gets a phone, because now you can talk. And if the third person gets a phone even better and even better, so the value of the system increases exponentially with the number of nodes. That's a network effect. And with SBAs, this is actually quite similar. So imagine you have your non-SBA with all these batches, and now you transform your first batch into an SBA. At this point, you're not getting any value. You're actually making performance worse because the SBA loads a little bit slower than the non-SBA. Only if you transition all the other batches into the SBA as well, this is where it gets awesome. But think about phones. There was this person that had the very first phone on the planet. But ultimately, I think phones became relatively successful. So good indication that we want to get over that initial bump of the network effect. So let's look at our ticket again, the Chira Cloud 68845. We, with an SBA, we actually found a pretty good solution for the first half for the internal traffic. But there's also the second half of all the external traffic. And it's technically impossible to benefit from SBA transitions for external traffic, because you can't control the routing in the client side. There are reasons to do that. So how do you approach that? Well, let's maybe spend a second and sink through how an initial batch load works in a typical React application. It all starts with a browser, a backend system, and a CDN. The first thing that happens is the browser makes a request to your backend system. Backend system does a bit of syncing and returns a very, very lightweight HTML snippet. Basically, a script that points to the JavaScript and often an empty body, or almost empty body. What the browser now does goes to the CDN, downloads the JavaScript, executes the JavaScript. And what we're now getting is we're getting a page, but it is interactive. But it doesn't typically doesn't show a huge amount of data, because we still need to do our HX calls. And only now we're getting meaningful data that is also interactive. Let's plot this on a timeline. So first thing is we hit the server, we spend some time on the server. Customer doesn't see anything. We're downloading the HTML. HTML is pretty much empty. Therefore, the customer still doesn't see anything. Download the JavaScript. Customer still doesn't see anything. We run the JavaScript. Customer gets a page that is actually also interactive, but typically doesn't show data yet. And only finally, after doing our HX calls, we're getting a page that is also interactive. So how can we improve the performance of that? Let's get back to our tweet. Less code. Where can we load less code? Well, let's look at these two bars. Download JavaScript and running JavaScript. In a really large application, this can actually take seconds. How can we get rid of this? Well, it's very difficult to get entirely rid of it, but maybe we can do something different, right? Maybe here in the download HTML, instead of returning an empty body, maybe we can actually return meaningful HTML, like, you know, diffs and spans and pictures. HTML. Now, if we think about it, this is how the web used to work. This is what we did for 20 years. But, of course, we still want our new awesome stack, right? We still want our happy developers, and you see where this is going. This is going to server-side rendering. So how does this picture look with server-side rendering? Well, we still have our backend system. We still have our CDN. We still make our request to our backend, but instead of getting back to the browser, our backend is actually calling out to a new service, the SSR service. And the best way to sync the SSR service is really as a virtual browser that runs on our hardware, right? And what this virtual browser is doing is actually pretty similar to a real browser, right? It still gets the JavaScript, still executes the JavaScript, still makes the HX calls, and still generates HTML. And this HTML can now be returned to our backend system, can be returned to the browser, and shown in the browser. The important thing to understand, this is really, this is still just HTML, right? There's no JavaScript involved in the browser yet. So what the browser still needs to do is still need to go to the CDN, still need to download the JavaScript, still need to process the JavaScript, and only now we're getting an interactive page. So let's look at this on the timeline again. If we look at it naively, we're actually making the situation worse, right? Because we spend more time on the server, we download more HTML. What's the value of that? Well, let's look into what the customer is seeing. The first thing that we can do is we can actually remove the HX bar because that's something that the server has already done for us, great. But more interestingly, we're actually getting a new state on our timeline, and that is rendered as compared to interactive. And if, and that's a big if, the rendered state is something that you consider as ready for your user, you're actually getting a massive performance gain. Because keep in mind that download JavaScript and the run JavaScript bar can take seconds. The million dollar question here is, what do we consider as ready for our users? We are now introducing two states that an application can be in that's rendered and interactive. These two states also exist without server-side rendering, but they're very, very close to each other, but they're virtually the same. With server-side rendering, we're moving the rendering forward, and we're actually making the interactive a little bit worse. So we're creating this gap, and this gap is a problem. And for people sometimes called the uncanny valley, and let's maybe go through a concrete scenario, let's say you hit this beautiful dialogue here, and as a user, you want to click the submit button. You click it, and nothing happens. Click it again, nothing happens. Give it the first search right, click it again, and now something is happening, right? So as engineers, we know what's been happening here, right? It took a while for the JavaScript to be loaded, for the JavaScript to click in, and actually make our page interactive. For the customer, this simply doesn't make any sense, right? And there are different ways to solve that, but I think the most important question that we need to ask ourselves is, does the time that it takes for the customer to simply read the page, to digest what they are seeing in front of them, does that take long enough for us to bridge the uncanny valley? You know, we can reason about that, and we can debate about that, but let's maybe take a step back. The best way to figure out is to measure and to understand. So let's get back to our ticket. We talked about having 50% of internal traffic, and roughly 50% of external traffic, and we actually found pretty good solution for both of them, right? We found an SBA model that solves the first half for us, and we found an SSR model that makes a big improvement on the second half. So, and both of them really could be solved with the most important tweet on Twitter, and that's with less code. So, kensy.sas writes less code, but he also says this, and I give you a second to digest it, but what he says is, in the end, right, we can't write no code, right? We're building rich applications. Clearly, they need to have some code, and this is where I think the third big learning comes in, and this is to be opinionated. You know, in your company, you have these teams of rocket scientists, right? That all, you know, they want to move quickly, they want to move independently, they're doing fantastic work, but there's one thing that we always need to keep in mind, and this is that all our engineers, all our teams are operating in a shared limited resource, and that's the browser. And in the history of mankind, there has only ever been one way to sustainably use a shared limited resource, and that's through rules and conventions. So, what you really, really want to do is think hard about the patterns, practices, and technologies that you want to align on as a team, as a company. And another thing that's really important is, you know, you can have the best rules and the best conventions in the world, they are useless if people don't follow them. And we are all humans, and humans make mistakes. Robots don't. So, we at Atlassian are borderline obsessed with all kinds of static code analysis. Don't try to read this, but what I'm trying to convey is, we have a huge amount of customly developed ESLint rules that actually encode the patterns and practices that we consider as important for our performance. And we also developed our own tool, it's called Stricta, and I won't go into detail, but you can sync it as ESLint but for your file system. If this sounds interesting to you, if you want to contribute, check it out, it's open source. And this brings me to the end of my presentation, and what I always like doing at the end of a presentation is make a really bold statement, something that banks, something like SBA and SSR solves all your problems. Let me do this again. SBA and SSR solves all your problems. The thing is, I can't, because the world is a little bit more complex than that. What I can say is that you want to measure and understand your application, you want to understand your problem space. The second thing that I can say is you want to seek for systematic solutions. If you think about it, SBA servers are rendering, they are systematic solutions. They work for every page of your application. And even establishing better practice as technologies, that's also a systematic solution, because it also applies to your entire code base. And last but not least, do the right thing for your application and do the right thing for your user. Every application is different. What I can say is that for JIRA, introducing an SBA and introducing servers at rendering made a huge impact. Thank you very much.