 I guess we will start right? Yeah, okay. Hey guys, my name is Pawan and that's Janmejain. Both of us work as developers in a company called ThoughtWorks. We work on the studio's division which is basically the product division of ThoughtWorks. So this talk, we call it performance division in the one. It's, yeah, like we called it that but then what this talk has like evolved into over a period of time is that, so this is how you think, like what are all the things that you need to keep in mind? What are all the things you need to keep in mind if you want to scale up your app rather than scale out? And what are all the things to keep in mind if you want to build Hydropotaps is basically what this talk is all about. So this is a gist of all the things that will be covered in this light deck. This is more for people who look at it online like we have shared it on LightShare but we don't need to go through it right now. So the context of what we are doing, the app in reference. So where most of the things that we have done is this presentation is basically a result of the last two and a half years worth of work that JJ and I have done on this app. This app is called Go. Go is basically a release management tool. It runs your builds, it runs your deployments. You can think of it as Jenkins and all of them being orchestrated by one single tool. So which makes Go massively distributed, like scheduling and a process-funding system, which makes it complicated. So the predominantly Go is built with Java as a backend and we use Rails for the frontend for the web app and the UI part of things. It's a managed environment which means that it runs on a JVM, which means that we don't have any control over how JVM does its memory management and things like that. We don't have any control over any of that, it's managed for us, which makes it all the more tedious. Contrary to what people think that it's a good idea to use a managed environment, we actually faced a lot of problems because of that. And like JJ already said, most of whatever we have done and our toolings have worked only, like mainly on Linux, we can probably find an equivalent of that in other Oasis. And over a period of time what we have learned when we do this presentation is from feedback we have got is the approach and learning of what we are going to talk about is independent of text stack. It doesn't need to be like, even though we have Java as our reference app, it doesn't need to be Java specific. What we are going to talk about is independent of text stack. Performance performance, do it practically, good chance that you'll get some, but amount of output which is, performance will likely get out of it is going to be, the ratio is going to be very hard. How many of you have, which has had performance issues when it was required to product? So, which means that most managers are like, oh, last week's had a performance issue, this week should not have any performance issue. I'll assign you like, I'll give you devs. Devs, go and look for, you know, that's an anti-pattern according to us. So yeah, rather good emphasis that don't be out there with the gun looking for trouble. Let the problem come to the right set of tools when it comes to figure out exactly what the problem is and figure out exactly how to fix it, like minimal effort. So that's the key of performance during the workplace. So, the dashboard is low, probably you have like a page which everyone comes with, like a whole page of the app that is low. Or the JavaScript is low, that's the performance issue. Builds are taking too long. In part it's to propose a build server. So it translates. The builds are taking too long after a VCS check-in. That's low if the workers which are running on different other machines and they're talking to the server, they're taking too long to get work when they start executing the build. So that is one kind of premise. We check-in to STM, but code doesn't find out about it in like a one minute, sort of a little time limit. That's the problem. The process of working too much resources, you've given it a 16-core, 18-box. That is required for a basic setup in like one build and one agent. That's not good, right? So basically what that means is in the context of your app, most people think of UI not responding as low. But UI not responding is like a very, like one thing of the app. It's something of the whole. Like on a, probably on a simple website, that might be the only thing that matters. But on a real web application, let's say it has like background posters, passing cards, things happening. They're measuring the throughput at what is low, itself will become a very hard thing to define. So even if your UI may be sloppy, the application throughput itself may be very slow. It might not be sending enough mails, it might not be sending enough messages. It might not be modifying enough files that are supposed to modify the background. It can be anything. It can be, right? So just because your end users are not saying, oh my UI is slow, doesn't mean that your app is actually slow or it has a performance issue. That's what it really means. Like even in our case, that's what it meant. Like all the background things that are supposed to happen. If any of that is low, it means the app has a performance issue. It's still a slow, inspired by users seeing it as really fast. So in terms of acceptance, right? Like most features or most environments that you work on need to have some end point. So typically in your performance fixing work, let's say somebody has come back to you and said, okay, do we have a performance issue on our production app? What do we do? Right? So first thing you need to do is define how slow is slow. Right? Because some people may just be very aggressive. Like they may not have the patience. Like they may have a really huge and complicated setup. Like typically this is the case with products. Like if your product is being in different kind of environments, like if it's not hosted and you know if you're selling it to different kinds of people, like people and setups. And if they have some huge or weird case and they don't realize that, but they're just being unreasonable for that, just to tackle that case. The first thing you actually need to define is how slow is slow. For different kinds of setups, what is an acceptable response time or acceptable throughput is something that you need to first define. Right? Okay, let's say we define that. How fast is fast? You don't want to spend the next one year fixing a problem. Right? You need to figure out, like typically performance fixes, like for geeks, it's actually a big time there, but in a ring. So they typically say, oh my God, this is a really nice problem. Let me just put all my effort in fixing this. But then they may get carried away and they may work on it forever. So for you to figure out when to end it, you need to also have an upper limit of how fast is fast. So you need to know a lower limit of how slow is slow. You need to have an upper limit of how fast is fast. That kind of figures out what is the boundary of your work. And these things have different definitions, right? It might be that I as a manager want my app to have a throughput of 5 meters per second. That's a different definition compared to I as a user want my page to respond within like 20 milliseconds or half a second. Right? These requirements may be mutually conflicting. They may be mutually independent or they may be somehow related to each other. So identifying what is your benchmark is kind of important. Yeah. So benchmark using realistic setup. Like it works on my dev box, doesn't it? I think we all know this. It doesn't matter if it works on your dev box. It needs to be on a realistic setup. And it needs to be like if it's a product, you better make sure like you have like profiles of different usage patterns and things like that. So make sure that it's realistic. I don't even need to get into that. One thing, like going back to the thing that we said, don't go on a goose hunt of where is the problem. The reason we say that is because it's hard. Like it's finding a little in a haystack, right? Instead of what you would rather do is reduce the time to fix issues. Like issues will keep coming up, right? Like bugs will keep coming up. So what you need to do according to us is have a repeatable and automated way to set up an environment where you can easily get to the problem that a customer is facing, right? So that is what is more important than actually, I would rather spend my effort. If somebody feels that okay, we need to fix performance issues, I would rather spend my effort in trying to automate things, but not actually fix anything because what you would rather do is wait until you hit the problem, immediately reproduce it, fix it, and then get on. So that is where, like in our experience over the past two and a half years, that's what we have found to be most valuable. Where we have automated scripts to like, within 45 minutes, we can get to any customer setup with just one execution of, one line execution of the command. That's the kind of automation we have. And the important thing is repeatable is essentially even more important because once you find a problem and once you fix the problem, you want to run it on the exact same load profile in the exact same environment and your profiler should suddenly show that the problem has vanished, right? The profiler will show you this method is taking like 20-90 seconds. Every time the next time it shows like this method is taking like 1 millisecond. Now that's good, but that has to be on a repeatable and automated setup so that you can't make human errors. You can't just go from like 3 agents on go to like 1 agent on go. And then you say, oh, now my point is shutting fast enough. That's not good. Yeah, so like you said, don't go overboard in fixing performance issues. Like contrary to what, like you don't give any hype to it. It's like any other problem on your system. Like the reason we wanted to start our talk with this was because there used to be a time in our product where any performance issues would give jitters to our product manager would use the website to our development team. Everybody would be scared because like they're scared because of the unknown involved in it, right? But the problem is it's like any other thing, like it's software at the end of the day, right? All we did, like it's an expertise that we didn't have and it's an expertise that is not very well documented, but then it's easy to gain. I mean, it just needs some, you know, like our whole approach now is we don't, like if it's a performance issue, we treat it like any other problem. We say, okay, fine, it's fine. Like what's, like it's a bug. It's a part of the bug tracker. It's a part of the bug tracker. We treat it the way we treat any other piece of work, right? So that's the thing that we wanted to start off with. Don't go overboard trying to, like oh my lord, we have no performance problem. You know, don't go overboard. You can easily define acceptance criteria for it. The moment you can figure out what is the work involved and what does it mean for me. And if I have automated scripts for it, it doesn't need to get the importance of the risk factor that typically is given. That's the thing that we wanted to talk about. So a typical cycle that we figured out after a lot of fixing and finding issues is this. It's essentially a fish. So you start at this point. You measure what you have. With this measurement, somewhere here, you realize that you have a performance issue. Then you fix the performance issue. You again come back and measure. And you realize oops, my fix hasn't worked. And then you fix it and you come back and measure. And you treat all it as an attachment as you need to. Eventually, because what we just said was find your end goal, right? Decide what your end goal is. Once the end goal is reached, you are bought out of this. And you come back and stop. So what is the problem with me starting at fix? The problem starting here is that I have not yet found the problem, right? Like you get a part from the code part. You say, oh, you found a problem. You fix it. That should not be how you follow it at all, right? Saying it's low, all right. That is a performance problem. But I haven't yet figured out what exactly is low. Is it taking out of one second to the page is taking the load? Is it taking 50 seconds? It takes like 15 milliseconds. Or it takes like 500 milliseconds. Or it takes like 750 milliseconds. That's important, right? I want to take the 750 milliseconds first. And you have no idea how many times where we didn't have to do anything here about the wrong memory configuration of that. You know, like, you have no, like, that's why it's very important for you to measure, because if you measure, it may be working fine for you. Which means that then you say, hey, user, do you have enough heat associated with it? Is your process swapping? There are so many things that you could easily find out that you don't even need to get into this place. Because this is expensive. It's developer's time, right? Let's say you're billing at $40 an hour. That's how much money you will spend in order to do this. But if you have had an automated setup already, that would be free. It doesn't require any of those things. And the automated setup will allow you to do this as many times as you want, and you just decide that you want to start any from, that's it. Like, this is the most expensive part of the process. So don't do this. Avoid doing this, right? So that's the message. What if I stop at fix? Like I said, if I stop at fix, I haven't yet measured and informed myself that the fix actually works. And I stop at fix, I deploy it to the customers and the customer says, oops, it still stopped. Guess what, I fixed nothing. So you have to go to fix and come back to measure. You have to ensure that measurement shows you very, very clearly that the problem is gone and that you should make the best of it. So, yeah, that's... Why do I need to iterate? Because one fix that is tried may not work. You may have to try another fix and a third fix and a fourth fix. And also, one fix might affect it by a certain percent and that's not good enough. According to your benchmarks, you've decided that it has to load in under 500 milliseconds. And if it's 550 milliseconds, it's just not good enough. And then you've got to add another 50 milliseconds here and then iterate and figure out, oops, now it loads in for 90 milliseconds on. And in our experience, if your constraints plays a massive role here, like if you are familiar with theory of constraints, most of the times, only the first constraint shows up in measurement. If you remove that, guess what? There are lots of other constraints that have not even been uncovered in your measurement. So the first fix, only remove this constraint. But then there are so many other constraints for you to remove by the time you hit your acceptance criteria, right? So that's why we have to, it's very important for you to iterate. Just by fixing the problem that you uncovered does not make your app fast, typically. And sometimes it's so important that you might have fixed something and actually fixed something. That is a real fix. It's not a fake one that is done without measurement. And you may come back and realize by fixing that and made something else worse, right? Because that's the nature of contention issues. Locked contention, if you fix one, it moves to another place. If that another place is more contented for than the original place, then you've made it worse. Right, and guess what? That other place may be contented for by a more important part of the app than the web page itself or than the backend itself, right? And then you have made not only this thing worse, you have also made other piece that was using this contented portion worse. So it's very important to go into this side. How long should I iterate, obviously? Your benchmarks will tell you. When you have achieved your benchmarks and then you have not made anything else worse, over here you will realize that your benchmarks have been met. So, inclusion, unlike functional bugs, where intuition works marvellously, intuition works really, really bad when you're looking at performance issues. So, please do not go by your intuition. Even the intuition tells you that it's a GC issue, please take a profile and figure out this lock and menu if you're not a GC issue. And when you're absolutely doubly sure, only then go and apply a fix. Always go by proof, be mathematical. The easy and obvious one is page response time. You have no idea how many times we get asked about, oh, do you guys use jmeter? Do you guys use apache benchmark? Do you guys use a flip-up? And most people, when they are thinking of performance testing, they are really thinking of load testing and they are really thinking of loading only the user pages. I mean, that's the team. That's the easy part. So for us, the easier and obvious one is that the not so easy one is how do you define app throughput? How do you measure your app throughput? How do you measure your message queue performance? How do you measure your CPU or your chart? How do you measure your database performance? Lock contention. If I have like really long, one thread holds a lock for a long time. There are like 100 other threads which are waiting for a lock. And if it's in a managed environment, how do I even get that information? Because everything on your vision is fine. Your IO is fine. Your CPU is fine. Your CPU is fine. But my app is actually slow, right? So probably it's a lock contention issue. How do you even measure that? How do you measure the environment-specific things? Like memory usage and GC channel. How do you even figure out environmental issues? My hard disk, for some reason, my hard disk is slow. Like Ram was our ops person for a long time. He knows of a few instances where our own production server had a problematic IO driver. And we spent a lot of time trying to figure out why the hell is my DB taking so slow? Like, oh, my 20s like really non-performance. But it turned out that our performance had dropped about 16 times because of a bad disk. How do you even measure such things? Those are the things that we think are more important when it comes to performance testing. Like using Jmeter and AP is easy. Using these things and understanding and interpreting the output. And trust me, this is a lot of data. Like JJ actually plotted a 24-dimension queue in order to plot all this data. It's a good intellectual thought experiment. We'll keep it aside because there's no correlation that can be established. So that's something that we quickly realized. So you need to have people look at this output and figure out what the hell is going on. So that's why we think that what to measure is a super important thing. And don't just go with Jmeter. And don't just go with AP. Think of all the things that you need to do. And remember, understand your environment. Like we said, we know how to do these things on Linux. But we have no idea how to do these things on, let's say, Solaris or on Windows or on Mac. We don't know. Probably for you guys, if you have deployment from those environments, you need to understand it really well. For instance, something for us to... Like we have very dp heavy. Our dp is always under cvr load. If I don't understand that my dp is under cvr load and it's bringing a lot of IO. And if I don't understand that my server is running on SSD and it is indeed running on SSD, say. In that case, if I use anticipatory IO scheduler on Linux kernel, I'm sorted. Anticipatory IO scheduler is just not made for that. It's made for magnetic lists. You are using the wrong tool in the wrong place. It's like round pegs and square holes. It doesn't make any sense. So, yeah, those kind of things is what we need to understand. And the reason you need to understand all this is because we are talking about scale up. Because it's very easy for you to scale out. But for us, the really interesting engineering problem is scaling up. So that's what we are trying to address here. Because most of the times, your CPU is really powerful. You have really fast RAM and you have really fast IO, right? And your network is fast. We are saying, okay, fine. We'll not leverage any of it. We'll just scale out. But for us, that's like a pop-out. What we are trying to do here is let us understand the computer science behind it. Let us scale up. So that's what this is all about. So the tools we use was, yes, for user load, we use something called Apache Benchmark. It's a very nice tool. We use the ht-perf. That's also good. AB and ht-perf are native. JMeter is actually on Java. So we actually found that JMeter itself has performance issues because of which you can't really load up to what you want. So for us, AB and ht-perf is a much nicer option than JMeter. So the app throughput, so we typically, for us, like, let us do, so Java has this thing called jstat and jmap and things like that. So you can do ht-scan. So there are Java JVM tools and there are equivalent tools for your VM. So you should probably start like, exploring those tools where it gives you these kind of counts. Then you have, we use something called ActiveMQ for our queuing. ActiveMQ exposes really nice statistics over management console. So we use that for measuring this. It might be rapid and queuing. So most queuing systems do have this in mind when they are queuing. So any serious queuing system will expose this data. Then in order to make a CPU or your churn, the most obvious one is the load average. Then we have iostat, we use iNutrify. For DB performance, we use a profiler called Yo-Kit. If you are working on a JM, you are great. It's like, don't bother looking at any other thing. It's not that expensive. It is comprehensive. It's one of the best JM profiling tool out there. I don't know where they would be without Yo-Kit. Yo-Kit is like, when you perform an open heart surgery, all the tools that our surgeon would need, those are the things that Yo-Kit provides. Yo-Kit is available for Java and for .NET. So people are working on data environment, it's there for you as well. Yo-Kit basically shows out every single thing that can ever be anything similar for Ruby. For Ruby? For Ruby, you can generate a snapshot that you can look at. For lock contention, again, Yo-Kit? If you are dealing with other environments like non-java and non-.NET environments, you can probably if you don't find anything that monitors lock contention, you can write your own extensions. Locks are normally implemented either as spin locks or semaphores. And you can just write rapidly around that and you can create a specific lock. And you can just grab and move on that map to figure out. Yeah, so again, manage environment specific gc locks. So you can just look at gc locks or equivalent of that anything. Environment specific things. So views like net stack, ps, dmessage. Like yeah, I mean dmessage. So basically anything that will tell us the machine itself is so This file is incredibly useful. We are trying to reason anything in the user space. It's like the most compact source of information from the bundle site for you. So yeah, this file is amazing. Go back and look at it. This is obviously the next bundle. I mean, going back to what you said roundpicks for round holes. So figure out what is the right tool for you and use only those tools. And in order to be able to use what tools to use, you need to understand what next should be like that. You know, that's the chain of reasoning. So tools comes at the end. The understanding comes first. So try to understand your environment your problems and then figure out what tools to use. With least possible overhead here. Like I just explained, right? If you don't have a lock condition profiler you want to write like a simple enhancement to your runtime. But remember if you start if you start flashing to this your lock messages for your lock condition profiling every single time then you're probably going to be locked on IO all the time and you're not going to get like right kind of data from that lock condition profiler. So ensure that you ensure you expand your buffer size appropriately. A professional profiler like your kit is like really really good at doing that. No, your kit is not open source. Yeah, that's a sad thing about your kit. Be sure that your buffers are like directly configured to the right size. If you're at the right time, there's a dedicated set of lock when it's trying to flash. So make it all them. So what did we fix eventually after going through all that exercise? We are going to get into the nature of fixes in some way. So it's important to understand your library. It's really the fine print in documentation. If you can afford to read code, it's often that problems that you see is not caused by your happens. It's caused by a library and fixing that issue is really as simple as just taking this library and stopping with another one and then patching it and applying the patch code production to it. So yeah, don't miss out on that one. Don't assume that all libraries are like perfect. Be mechanical. Start with measuring code to fix, come back to measuring code to fix, come back to measuring and then stop. So don't try to skip steps because you've done it like 10 times. Oh, now I kind of know it. You want to go wrong. Which is where we say execution works. So the last 10 times you have to go wrong. The last 10 times you have to go wrong. So execution works very, very cool. So you should be okay with using an exaggerated code to magnify the problem. You are trying to just get and that page that particular page is slow. It's okay to load that page really heavily so that you see how to fix, right? You see a difference from 5 seconds to 1 second versus 5 milliseconds to 1 millisecond, right? So that magnification sometimes very clearly tells you that you are doing the right thing. So feel free to use the exaggerated code and then add concurrency, higher number of requests to get the right data. You can use the same you have to use the same code profile to test your fix that you use to reproduce the problem, right? So that in the same situation your code is better now compared to rigorous add up the numbers. If you have like five calls in a method and each call takes x, y and z amount of time, you have to add that x, y and z and then decide the method's time. Over that x, y and z and figure out what the delta is. Is this method taking all the time or are the calls taking all the time? We are interested in all this reasoning, don't just be superficial. So that's important. You should think about varying on complicated fix like you to see all the corner cases where you are trying to fix logs, right? So you don't introduce a race condition, you don't introduce some kind of data corruption, not foregoing a log statement that helps you reason out in case of bugs. Typically in logs it's safer to be more granular than less granular. Whenever we reduce the granularity of logs we are generally a risky operation and the risk is data corruption. There is no performance but it's risky, right? So that's what we attack. Fix problems, don't attack three problems and then think okay alright I'll fix these three problems and then I'll profile again and I'll measure and verify that everything is fixed. No, don't do that. Attack one problem at a time so that you know exactly what is the impact of the fix on other areas, right? If I'm fixing my page load time, a scheduling should go for it also. Because then I made a more important area worse and that's not good. I mean, I know that we know we can take on four problems at the same time and the same patch fix all of them. It's okay, don't do it. It's fine. The overhead of So having said all that, correctness always comes first. It's okay to have a slow binding application. So these are just what we just covered. So let's say somebody told us what the problem is. We understand how to go about doing it. Now what are typically the nature of the problems that you fix in terms of lock contention. By synchronization we talk about lock contention, right? So typically what happens is if you look at one class calls another class calls another class calls a DV, right? So typically you have a lock here, you have a lock here, you have a lock here, you have a lock here. So let's say you made this granular that doesn't matter if this is very like it's very wide. If you fix this, this could still be wide. If you fix it, everybody accesses the database, so database lock you know. So typically when you fix one lock, it has the nature of granularity of locks once you fix one place it keeps moving on. So it's very important for you to keep that thing in mind and fix it. So more than one shared, if you have one, two data structures which get accessed across different methods, it requires synchronization. You just cannot do lockless concurrency there. I mean it's just theoretically impossible, right? So don't even bother trying. If you have more than one shared memory and things that get accessed, it's just you have to synchronize. There is no way you can synchronize it. It must be particularly proved. Transactional memory is a repair. But even transactional memory, if you have two different transactional things you have to put them in one transaction otherwise you can't have two different transactions. So that's the equivalent of this. So you have to put them in a single one. There are globally unique objects so interned objects are pretty useful. You'll see how we leverage interned objects. John has a notion of Permanent Pool. It's got permanent generation pool. So Permanent Pool usually keeps things like classes, it keeps things like unique copy of a string. A string pool is equal to a string pool anywhere else in the system, but it might not be the same reference. Interning means that it is the same reference essentially. The concept comes from common lists but it's used widely in Java. At all the operations, you obviously know what it is. It's xc and pxc. Comparance of it. Fixers, jumps, places unlike functional bugs. Functional bugs are typically I did not take care of a corner case. I fixed it. It's localized in one place. Locked intentions are typically they move all over the place. So you need to take measures properly. So look beyond local impartments. You haven't made anything worse. Okay, so nature of fixes with respect to GC, GC's garbage collection. When you hear from a user that unresponsiveness is intermittent as in once in like 40 seconds I see it being very slow. Otherwise it's very fast. It might be a GC issue. It is likely not easy to fix because it may begin either in your code base if you are lucky or in somebody else's code base and it's a library that we are using if you are unlucky. If you are really, really unlucky, it's code. So yeah, because maybe you are a library, like I said the use for GC is Java's GC logs. I'm sure other VMs expose GC logs as well. If they don't, it's really simple. You have to go to the garbage collection and start logging there. Again, take care of offering if you are doing it yourself. Provide you to see if majority of allocations are just from a handful of calls that means if there's a method called foo, bar and bars and these are the three most important methods in my app very frequently called. If foo is generating most of the garbage I'm happy. If there are 200 such calls that are generating 1% of the garbage or 0.5% of the garbage then it's very hard to fix that. Each one putting in is extremely hard to fix. Then try to play with the GC arguments and try to figure out what is the right kind of... If you have garbage or spent all over the place contributing a few MVs, it's like a super hard problem to fix. Unless you have one full year to only fix that you will probably not deliver it. So you would rather fix GC algorithms of your VM than use the right algorithm for you. So throughput collector is probably not the right thing to do. You're not even the content. So with your maximum omitted allocations now only if the VM says alright because I'm a VM, I can let you so you've got to know the maximum limits like that. Don't configure them far apart because if you configure them far apart it might start crashing. It's a modest cost for one but when it crashes your request will be admitted please go. So malloc and free are system calls that are expensive so try to not use them. Going to skip over DBDB has got nothing to do with managed environments. It's nice insights but you can read it up only. Come back to thrashing. So thrashing is about number of processes and your number of VM cores. So if you have like two few cores and you have... what's going to happen, context which is going to happen. Every time context which happens you'll see few caches get quite cold. If you are not on x86 your tlp cannot be populated without what is it called? page fault. Page faults are expensive and blah blah blah so you don't want that ratio to be extremely odd. A little bit of odd ratio is fine because you are normally blocked on ION and not on cpu so that makes the world happy. If you have too many logs that will cost too many sleeves. Logs are normally permitted as semaphores so the semaphore keeps so semaphore registers the locking thread on a listener list of the locker and when locker comes out it says oops the first looks like the thread is waiting for me so I just wake up the first thread and I will let go. And the first thread gets locked, the rest of them keep sleeping. So sleep based logs cost context switching very important. So like I said, context switching gets your cpu dashes l9 to l3 so yeah being it's in the game. Two frequent ION obviously ION weights are locking. When you try to read something it will wait you all just cannot return unlike mine. So if you go to the disk it will come back from there and return your data. If you are lucky it will be cached but most of us I guess are no conscious of them in the way. So even though swap can completely kill it a page fault it will also be a page load from the disk and the disk will extremely extremely slow. It's like auto of magnitude slower so you never trust your disk. Overlogging can be hurtful neither your OS nor your NVM cache will be able to save you if you are overlogging. Like your login will start to waste every one millisecond. Oh boy you are so good. Concept of buffered ION exists for a reason as in even if you may as well load a huge chunk in one shot than loading like one line and keep loading it. Keep a buffer over there you can use memory mapped ION If you are using memory mapped ION please don't close your file descriptor because then memory mapping also goes away and then it comes back and then it goes away. Assigning very low max memory limit can cause 2 pin GCs. The way GCs work in a native thread system like Java is all threads wait, I'll go and garbage collect I'll create some memory I'll remove the start and then I'll create some memory. So basically crashing is about all the, crashing is basically about all the common pitfalls for us when this is pitfalls that we have faced, there may be other pitfalls that we might have or will face so the first important problem we had was prejudice. This prejudice comes from intuition so we had this bad notion called SSL is bad but once we made it go through our measurement exercise it actually turned out that a lot of our problems was not because of SSL. In fact it was not even close to what DB was causing us in terms of problems. So using a synchronicity to avoid fixing root cause. So I have a lock, what do I do a user clicks on something it takes forever to come because it's locked on the back end so what do I do? I return the user thread really fast requested returns with that makes the click really fast but the actual through portable system is still slow. So just by introducing a message was there does not make your app fast. In fact it makes your app slow because it's asynchronous the real work that will happen will now happen at its own sweet time which means instead of synchronously finishing it in 5 seconds it might now take 15 seconds or 20 seconds so you need to be really, you need to understand exactly what it means in case of java it again means object serialization. I will read that the entire object graph gets serialized sent across to the other thread that's what it says ok and now I am ready to read it it reads the entire object graph and then it starts processing it. So many extra synthesizers wasted there. So for us the prejudice was oh use a message to there you will be fast. And then make the mistake by what point so yeah. So that comes from prejudice lack of testing we can't have more than 150 workers but the that myth got carried over for like we are a 5 year old app that myth existed for the first 3 years and about 2 years ago we said ok fine what is this let's just go and test it and then we could easily go up to 1000 agents and 1000 workers, workers lives and nothing happened and we were like what's that right so it's all about in any project you have you have prejudices it's very important for you to keep all of those aside be mechanical be rigorous. So that's why lack of setup understanding we have a misconception that 64 bit JVM is slow but guess what for all our like we asked Ram to move his current setup to a 64 bit JVM because we just realized that oh that has got nothing to do with the slowness all the time like it does have some overhead when you have smaller heaps sizes but for larger heaps sizes it's fine it doesn't really make you that slow. And it says that 64 is so much faster that you will not believe it is MMU can address what 16 times more more references with with one page than 32 bit it has a dedicated system call instruction there are two ILT 0x 0x 80 anymore so your register is up bigger there are a lot of these things that are it's a sick tool obviously faster and typically 32 bit JVM or 64 bit JVM has to go through an emulation right which means that emulation adds overhead. Yeah so there is this native emulation layer that allow you to address it. Yeah basically like this was clearly a lack of understanding so these are common pitfalls that you run into a long running project yeah like so we spoke about this already they are uncharted waters in all projects so you need to be careful about them you can read this up online if you want we are running short of time so yeah so know what you are getting in the like the most of the time when you jump into the deep end of the river it's not only the problem is not that it's deep it might have crocodiles in them like these crocodiles are unknown basically you don't know what you are getting into so make sure that you know that there are crocodiles waiting there for you and these crocodiles are what makes everybody scared of performance issues but if you know how to get around them and how to get around them instead of being here you have to be here exactly that's the so again you can watch stories typically like this is a tour presentation that we do these watch stories are all the things that we experience in our project and how we fix them but we can't share our code in a public forum so what we have done is we have gotten the same kind of problems in a sample code and we will show you how it was before and how you can probably fix it so the what we are doing here is basically we want to fetch an XML document from a remote service the fetcher, the client has some configurations which are persisted in a database the configurations name is passed inside is passed so you say okay use this client give me the data use this client let the client fetch the document from a remote service and give me that so that's what this method does so you go to the repo use the name and find that client if you don't have it you create a client and you put it back into the repository or persist it then what you are doing here is you want to eagerly fetch all the documents that are available in the remote service and cache it locally so that you don't want to keep making remote service calls this client exposes a good page count number of documents and you are trying to make RBC calls to each one of them and you are putting them into a cache so you took all the elements that the remote service gave you you want to cache it, you want to just log it and you return it and please notice that it has service level check level synchronization over there but then somebody who wrote this code knew that it was synchronized because you are caching it and cache data structure is being accessed if multiple people access the same cache data at the same time you need to synchronize it so it's obvious and the other things also if you are creating a new client you want to implement the ID of that and somebody outside wants to know how many clans your system has so you have this little better method over there that is also synchronized this is the implementation somebody has done this deployed beautifully works on their local box they even took care of caching they deployed it this in production this particular method call got 1000 concurrent requests at the same time what do you think happens yes lot of caching happened so the answer was lot of caching happened and yes that is true it is over caching what else and that locks over same object same time there are subtler problems here if you see you are doing a log dot debug here right and this string gets evaluated this anyone has any idea how big this thing is an analyst of all the DOM that you have loaded which means it's going to spew out pages and pages of stimulus or example probably right and it's locked at debug most people don't run their production apps with a logger level set at debug which means that you are computing this string and logger is ignoring it right and in java because strings are immutable one string two strings three strings multiple strings here which means that every time in my example of 1000 concurrent requests so there are about 4000 strings that were constructed which got completely ignored so this is done where the moment this stack frame goes out of stack frame pops the reference to this is gone so it garbage connects and guess what in production you are logging in for you are not even using the string right exactly it's insane amount of garbage being generated for no apparent reason right so that's a subtler bug then synchronization is here which means that this axis and this axis is blocked and then there is one more subtle bug you are caching an array list right most caches if you have configured a cache cache has this notion called number of objects in memory in this case there is only one object in memory and that object has this many items inside it and each one of them is huge so if there were 1000 accesses to different clients let's say there were 20 clients cache tells you for issue 20 objects but the cache size is 3gb right that's nasty that's really nasty and probably cache explorer management you say oh my god my heap has gone massively high and cache obviously has a odd reference because you don't want it to gc so you have a 3gb hard reference data if you look at cache you have 120 of it what the hell is happening right so you immediately discount cache cache is not problem right right so that is where like these are all the like I said these are not very obvious problems but these are all really massive problems in a highly concurrent app with high throughput requirements so now we are going so we had a fixed thing in the middle which would show you how it looks with fixed and very quickly show it to you it's very ugly so we are going to refactored fixes so to quickly cover because we are running out of time we have wrapped this with these log dbox we are not causing unnecessary gc churn unnecessary gc churn the synchronization keyword has gone from there it has gone over here instead of instead of caching all the elements and caching an array list and caching each element separately so my cache actually reflects how many objects it has how many we have objects right and I don't have transactional anymore so I am not trying to cache in transactions because the transaction gets rolled back by caching under if you guys want just a quick dumb call what happens if you cache inside a transaction which is offline it's a crazy crazy problem you don't want to cache inside transactions even if you have transactional caches it is a crazy problem yeah so in a concurrent app so we can talk about it offline it's a really insane issue the call get client here so you can the key over here is for the element so this entry is being cached with this key synchronization and the get client call so this is called a double check clock so it's called a double check clock extremely useful construct and synchronization yeah so like what we are trying to do here is basically we want to show you all the things that could go wrong and we have fixed a lot of them and this is only 20% of the actual slide actually go back and read the slide it already makes sense here again double check locking here this one and this one and I can try to get it from the cache you can't find it take a lot of credit from the cache you can't find it to an actual file get it if it's still not then create it the create method itself is transaction but the whole method is not so I can legally put things into my cache without going wrong right and the synchronization keyword is gone away it is only over here so we basically reduce the granularity of the cache we reduce the decision we are caching the right things in our cache cache will tell us now the right number of objects in the cache and then there is something and how do we create the synchronization that's really important this is an intern string string being interned over here so it's a globally unique reference across the entire game so I can literally lock on that the create line method is transactional but it does very little so that's nice I'm not accessing any cache over there the synchronization here is also gone away if you look at it it's an autonomous operation that I'm using increment and get it doesn't use comparison swap it will increment it by one try to do comparison swap if it fails it will increment by one try to again it's a loop right so this is called a spin lock in the kernel terms it's called a spin lock this prevents process preemption in the kernel space but over here it's not even semaphore so it's not making you flip non-schedulable then flip to schedulable piping your cache is clean so this is a really nice constant go back and I'll look at all the atomic operation the new game supports so with that really yeah so if you are interested this is probably about 20% of all the things that we have covered in that questions will be off by any current question back and forth thanks guys