 It was a really exciting day, but first let me tell you that the story, all names, characters and incidents portrayed in these productions are fictitious. No identification with the actual persons living or deceased, places, buildings and products is intended or should be inferred. Just as a curiosity, this was started in the film industry after the MGM production Rasputin and the Empress. In that production, it was a film, Rasputin raped Natasha. Natasha was the character portraying Prince Irina and Prince of Irina shoot MGM and she won what we would be, what it would be today, over two million dollars in court and nineteen million in out-of-court settlement. If you are not a native English speaker, like I am, and I mean I'm not a native English speaker, this is legalese for do not sue me please. As I was saying, it was a really exciting day. After months and months of work, we were releasing a new feature. The team was really, really excited and thriving because of this, but suddenly our project manager came with bad news. We were having a leak in the server and I'm not talking about a small leak, I'm talking like this kind of leak. Welcome to Fast Flood, a story of a massive memory leak in Fast Food Lab. My name is Sergio Arbeo, I'm also known as Serave in Twitter and GitHub. I work for Dockyard, Dockyard is a digital product consultancy from ITIA to find a product. We have QA designers, project managers, engineers. If you're looking for some people to work with you, just drop us a line and we see what we can do. If any of the presence are not familiar with what Fast Food is, Fast Food is the server render engine of Ember. This means that we can render in our server the pages so we can serve and already render pages and let Ember take it from there. If you're not familiar with what a memory leak is, is basically that. It's a piece of data that should have been garbage collected, but it's not for some reason. Let me do kind of a clear example with this. Let's say we have variable one, two, three, four, five and six in our memory and let's say we drop one, two and three. If after these, we still have one, two, three, four, five and six. In memory, we say we have a leak. It could be good that whenever we drop one, two and three, we find ourselves with one, two, three, four, five and six in our memory. This means it's reproducible. That's great because it would be easier to debug it. But it's still bad that we have one, two, three drop and still find ourselves with one, two, three, four, five and six. And not just because we cannot get rid of one, two and three. It's just because we don't know what else we can have in our memory. Like oops. And actually it's soy. That said, I hope this makes things much clearer. But why does memory leak happen? The main reason is that something else is keeping a reference. This is almost a hundred percent of cases. There's a tiny, tiny chance, really tiny, that it's a garbage collected book. It usually happens in frameworks themselves. It's really rare to see one of these in our applications. As we are talking about references, we can easily create an object memory graph. This is the object memory graph tool in Firefox. We are not going to see much of Firefox here because as we are in fastboot and fastboot is node, it's much easier to work with Chrome developer tools in here. I don't know for you, but this tool reminds me a lot, the file directory tool in Jurassic Park, and a colleague told me that that was an actual thing in Solaris. But let's see a tool that's much more useful for us, and that's the heap profile. In here, we have two panels. In the panel above, we have all the objects in memory, and then we have information about them. The first piece of information we have is what people call the distance. That's the distance from the GC route. That's a little hard to explain. It's much easier to see written documentation about this, but the general idea behind this is that the biggest memory leak, the smallest, this number must be. It's not a real correlation, but it's highly likely. And then we have the shallow size. That's the size of the object itself. Finally, we have the retained size. The retained size is the size with free, so we freed that object. Let's see an example of, for example, the object. In this case, we have 400,000, like a 3% of memory in shallow size. But so we freed this object, we would be freeing other values as well, and those would free almost a 30% of the memory. So this first panel, we have the retainer's panel. We can send objects from this panel to the panel above and vice versa. This is really useful because we can look for an object in the panel above and send it to the retainer panel, and we can see which objects are retained in that one. Really, really useful. As I said, this is the heap profile. We can do really cool things with this. Basically we captured the memory state at one point in time, and the tool let us compare several different profiles. For example, if you work mostly in the browser, we can do things like this, what we call the three-snapshot technique. For doing this technique, the first step is to warm up our application. Let's say just started or started and logging in would be warming up. This would create a few objects in our memory. After this, we create the first snapshot. After the first snapshot, we do the action we suspect is leaking memory, and we do a second snapshot. As we can see, after this action, a few objects have been marked to be recollected. For example, the one in the bottom left corner is marked for recollection. Then we repeat the action and we do a third snapshot. Okay, now we have three snapshots. You might have suspected we would do it so because it's called the three snapshot technique, but what can we do with this thing? We can do the following. We want the objects that are in the third snapshot that removes all the objects marked for recollection or recollected already. Then we want the objects created after the first snapshot. We are not interested in the objects created during the warm up. Maybe if they move now. And finally, we want the objects created before the second snapshot. We are not interested in the object created after doing the action for the first time. While this does not pinpoint us to an object that is leaking, this does just reduce a lot the memory we need to inspect. But this is not really useful for us. Because in fastboot, the memory are more atomic. We don't have leaking between requests. For that, it's much more useful the timeline tool. The timeline tool looks exactly like the heap profiler we saw before. But with the timeline now, well, let's inspect that timeline. In that timeline, we have a blue bar that represents the memory we are consuming. If some of the part of that memory is being recollected by the garbage collector, that part is displayed as a gray bar. More about the memory in fastboot is that usually in fastboot, the warm up action involves a higher memory being consumed. But subsequent request does not consume that much. Usually after a few requests, a new application, because that was the application. If you remember when we introduced fastboot, there were application initializers and instance initializers. It's mostly the same here, like we create the application and we create the instances. A new application is created and the other one is dropped. In this scenario, all the requests are leaking almost a 90% of the memory. The ideal situation could be something like this. We see all the requests gray. Okay, you'd be wondering, now we have the tools. What? Okay, I'll tell you the process we followed and we refined during that story. The step zero is we need to reproduce that locally. Some of you might be thinking about using GitBesect. That's a really useful tool. If you can use it, in our case, since we were using Feature Flags extensively, we've been working on that for months, so it was not useful for us. In any case, this is useful for anybody. Production is built. Why? Because we want to have the build as close as possible as production. That means that we might need to remove some loggers or some services. But if we were using, like, if we were building the fastboot application and moving it to another project, we would be doing that in here. We want to be as close as possible. And one big change that really needs to be done is no amplification. And that's just because in the panel we saw before in the Hips and Apps app profile, the name of the objects would be there. But what if your object had no name, like a simple pojo you were passing? Well, we have a snippet for that later. The next step would be, look for the leak in our code or look for changes between versions. We can approach this like, OK, we have just received the project in one state. Let's inspect the project as is now. Or look for the changes that happen in those months. For finding the leak, we followed this process. The first one is running the server. Don't forget to use inspect on inspect VRK, so you can use the developer tools with your node instance. Then we do one request. This idea was taken directly from the Free Snapshot Technique. And also, we do this first request manually. This is important because sometimes you don't solve the memory leak but break the build. And I would like you to see if you are still returning that website. Then you start the timeline and finally make a few requests so you can inspect the code. For making those few requests, we usually use Apache Bankmark, the AV tool, with concurrency one, so you can see more clearly each of those requests. This is the snippet. So you can see, while inspecting the memory, the name of some pojos, you can use this snippet that just would let you see that pojo as leak detect in the inspector and look for it. Or these are the ones. They have the same effect. If you need several names, just change the leak detect for the name you want. Full bar, macarena, whatever. Then we have step two. We need to find the dominator. Dominator is the time in the industry. I haven't found the other one. If you know of a better one, let me know and I'll change the presentation. But the dominator is basically the retainer we need to remove so the leak is gone. Or we can also find the dependency because the leak can be in one of our dependencies updated during this time. Step three, remove the dominator. Or change the dependency version and win. Thank you so much. Wait. This was not that simple in our case. We were dealing with two big problems. First, we were a fully remote team. There were four people to our team and I think there were even four dimensions. And we were leaking the container. If you are new to Ember, container is basically the registry Ember is using for everything. Everything is in there. So that's the reason we were leaking almost 90% of our memory. So what do we do? Well, after confirming we were leaking the container. That was on the very first day. We have two approaches. The first one is look for owner leaking. Owner is basically the public API of the container. So we might be leaking the container somewhere. It might be our code or some of our dependencies. And also update to late December. We were not in the late December because of reasons I kind of disclosed. But that's the other approach. Maybe hopefully, sorry for the Ember core team, but hopefully the leak was there and it was not our fault. Spoiler alert, we don't know. Then we assigned tasks based on people's knowledge. For example, there were one person on our team that had updated a similar application. So we asked him to start working on that, update our Ember JS. The other person was the main person behind the changes behind this new feature. So we charged him with going through the changes and see what could be wrong. And two of us had more experience finding leaks and inspecting memory. So we charged those people on doing a general investigation on approaching these as if you were new to the project. Done this, I cannot suggest enough that communication is key. Communicate early and communicate often. This is just if in a remote environment communication is really, really important. In times of crisis, it is more. Early and often let us prevent duplicates of a 40 different tasks and also use your colleagues as rubber ducks. Even if you think you might be wasting your times, the times of the time of your colleague. This is not the case because this is a time consuming task that consumes also a lot of morale. You really need that human contact as well. Take small victories before winning the war is one of the other key concept I want you to take from this talk. First, finding the leak won't be done by one individual. As we were splitting the task, the responsibility shall not be split. Why? Because the only reason one person in that team is finding the leak is because the rest of the team is trying other approaches. This is really important. This is not a competition. This is a team effort. But why taking small victories before finding the leak? First and more important, morale. While going through this process, even if it's just a few days, there will be a really intensive days that will take on your morale. But why these small victories affect them and leave your morale? Well, it decreases the pressure. If you consume less memory, you need to restart the server less and you get less pressure from the external services. Also, it improves your code base. Less memory consumption, snap your apps. And less memory consumption, you need to inspect less memory to find the leak. And that's nice. That affects morale as well. If you need to inspect less memory, it's easier to find it, at least in theory. But please don't take weak victories at any price. Some improvements are not worth it. Think that you might make a change that will need to be taken into account for the foreseeable future every time you do something. Those changes need to be easy to drop in case you want to drop them. And doesn't need to be hard to maintain. For example, one of the small victories we took is that we were using presenters in our team plays. And we stopped catching those presenters in fast bootland. There was four or five lines of code for that and they were easy to remove in case we wanted to. And that removed the memory consumption by 30%. And that's nice. But four days later, we were still in the same point. We were consuming much less memory, almost half of it. That's nice, of course. But we were still leaking like 40, 50% of our original memory. What can we do now? This is hard to describe because we were out of options. OK, then we thought, this is basically the request in Emberland request. If you're not familiar with this in fast bootland, you just get that request. It goes through several middle words because fast boot is basically an express middle word. Then hits fast boot, fast boot goes to the router. The router creates the routes. The routes loads the data from Datastore. Then it initializes the controller. And the controller renders the template that uses several components to be rendered. This is a simplified version and really inaccurate. But I think it's useful for our purpose. So the first thing we did and we did this early, like the first day or the second one, is to check if it's something in our other middle words because we were using several of them. What we did is substitute fast boot with a static response and the leak was gone. So that means that it's actually in our Ember application. After that, what we thought was the weakest point we can attack and easily change for a static response. That's simple, the template. What we would do is we would remove that template and just use a static HTML receipt from the server. We did this, the leak was still there. That means the leak was not in our templates or any of the components below it. Next place, we would replace the model in the route and we would return a plain old JavaScript object. We did that and dingo, the memory leak was gone. So we knew the problem was in the store. We had a really, really custom store adapter and serializers. So that was bad news. The good news is that we were using those customized adapters and serializers for really long. So we were fairly confident on not being there our memory leak. What we did is, at this point in time, we spent a couple of days replacing parts of Ember data and our adapters for static responses. This is not as simple as it sounds because depending on the point, we might need to tweak different things. But after a couple of days, we found the problem was in our adapter. Do you want to see the problem? The leak was here. And we have in our adapter, we have computed property for headers. This is using the old syntax because this happened almost a year ago. And in these headers, we were returning an authorization with token injected from one album. Do you want to see the fix? Because this is going to be really nice. The fix was this one. Headers was just a getter. But why was that happening? We suspect that something was happening in the request because all the properties in the request are being like lazy, like computed at the last moment. And we think that's a combination of that and how the value in the bear was injected. But we don't really know. So my last advice for this would be, let go. If it's hard to reproduce, you won't be able to send a reproduction to the Ember team so they can find out. And maybe it's over your level of knowledge. Maybe it's over any of your teams' level of knowledge and you cannot really find it. You can spend some time on it, but don't sweat over it. Thank you all for attending my talk on this remote version of EmberConf. It's been a pleasure talking to you, at least virtually. If you have any question, I don't know if there will be any system in place for doing that live, but you can reach me on Twitter at Serave. Thank you.