 Thank you all so much for being here. There were so many good talks in this time slot. It's really exciting to see so many people attending mine. This is wonderful. So as Valarie mentioned, my name is Jamie Gaskins. I'm from Baltimore. And so let's go ahead and jump right into it. I'm not sure every time I've done this, rehearsed this talk, it's gone different amounts of time. So we'll see what happens. So first, I'll stop hitting the spacebar that hard. But when we talk about optimization, we need to kind of understand what that means. We need to turn this fuzzy term into something that's a bit more concrete that will allow us to communicate about it better. And so at a high level, optimization is about improving some metric that you've chosen. You've chosen a metric to improve, and you want to move it in some desired direction along some particular axis. And so to improve a metric, we make changes to our system and then measure the effects. In this example, we're optimizing RAM consumption. So we want to move that metric toward zero. We want to take our amount of memory that we're consuming in production and reduce it. The desired direction that I mentioned before is toward zero. Because the less memory that we use in our application, the more we have on tap for spikes in volume. So this is just like, yay, we're just reducing memory, right? This seems so easy, right? Mission accomplished. But is it, are we really done? What other metrics did we look at? Did any of them get worse as we improved this one? Because especially as systems grow in complexity, optimizations might have consequences that are not easy to see and definitely not easy to anticipate. So let's say, for example, that memory consumption was high because we were using in-memory caching instead of a hot code path. So now we're recalculating the same handful of values on each pass through that code. And as a result, instead of our Ruby processes each using a fraction of a CPU core, each one is potentially using the entire core for itself. So some of the things that we're going to go over today are what are metrics? What metrics do we need to think about? How do we communicate about performance? And what are some of these trade-offs that we may be introducing into our code base? So first, some of our metrics that we need to think about are, you may want to optimize RAM consumption for when your production machines are starting to approach 100% RAM usage. App boot time can be problematic if rebooting your production app causes significant delays in processing. Milliseconds per transaction can be important if customers have to wait for that transaction to finish before they can proceed. And when I say transactions, I don't necessarily mean a monetary transaction or a database transaction. It's just a generic term for a unit of work. So this can be a web request, a background job, processing a message that came in over your message queue, anything like that. And transactions per second is an important metric for being able to handle, for being able to track how much you can process at scale. We also have some other metrics that we're not thinking about necessarily consciously, but they're in our minds somehow. And so they are things that we need to think about. And some of these are things like time from feature inception to release is a good measurement for initial deployment of a feature or service. You might call it a greenfield metric. Like once you begin working on a new feature, how long does it take before customers see it? Not necessarily until it's done completely, but before somebody has it in hand. Do you count the amount of time it's at in backlog, inside your task tracking backlog? Why would you or why would you not? These are just things to keep in mind. Time between deploys can be useful to see how granular improvements to an app are over time. Do we need to make a lot of changes to existing code? Before we can deploy this new feature, or can we just add new stuff in? Time from bug discovery to deployment of the bug fix can be useful in understanding the team's firefighting capabilities and the ability to respond to errors that crop up in production. Is it becoming more difficult to fix bugs that we're introducing? That's a metric that we can use. And again, these aren't necessarily things we should be optimizing for, but it might be things we're optimizing for subconsciously. We just need to know that we're doing it. And so our next step here is talking about communicating about performance. Because most of the time when we talk about optimization, we just throw that word around. Just like optimization, two people that are discussing the word performance might be actually talking about different things. So let's break that down a bit. When we talk about performance, we're usually talking about how fast it executes. So when you talk about the speed of anything, you're talking about time. And so discussion of performance typically revolves around one of these concepts. How long it takes to do something once? And how many times you can do that thing in any given time period? And it turns out these can be very different metrics. It's something to think of them as being inversely proportional to each other. And in a lot of simple cases, this very well can be. Because especially in a lot of cases that are easily demo-able, like something that you could do and hack out in a few minutes, they almost always are inversely proportional. But in complex systems, there can be a lot of other factors. For example, if you can parallelize the process, the cost of doing it 10 times might be the same as doing it once. The time cost, that is, might be the same as doing it once. Assuming you haven't reached some sort of CPU or IO limit. If you can cache the result, the cost of doing it 10 times might be anywhere from 1 to 10 times, depending on 1 to 10x, depending on the number of unique inputs to your caching function, how you derive your cache key. For a deep dive on how caching can help or harm your app, definitely want to check out Molly Struvy's talk, right in this exact same spot, just after the break. If your service runs on JRuby or some other Ruby implementation that contains what they call a JIT, Adjust in Time Compiler, the nth iteration of your code might actually be running in a fraction of the time than the first iteration because on the first iteration, the JIT and the code optimizer were cold. They hadn't actually been run on your code yet. On later runs, this is probably not the case, and they may have done some JIT compiling and dead code removal method inlining, things like that, to your code as it runs. Great tools, but they do mean that your code will likely run slower the first time, and that's just something that you need to be aware of, especially while optimizing. And those are a few ways that it can be faster than proportional, but it could also go the other direction. For example, if the first iteration doesn't end up invoking the garbage collector, but subsequent runs do, then you can end up taking longer on subsequent iterations of that code. And that's just when we're talking about CPU time. Once we start adding IO to the mix, we open up a whole new world of variability in optimizing for one or the other because then it becomes even more nebulous. And this, to me, is when things actually get really interesting. This is when it starts getting into that complicated segment. So requests made to remote APIs are subject to the performance of not only the systems that they're talking to, which have their own performance curves to think about, but also the network that you're communicating on. If you're not maintaining persistent connections to those API servers, you may also be subject to things like secondary TCP handshakes, TLS negotiation, and TCP slow start. Other things that slow down your communication. If you're talking to a remote cache, this adds processing time to the initial execution time as we wait for our cache key to be acknowledged by the cache server. So it has a lot of the same considerations of the API because a remote cache is technically an API, but subsequent interactions with the cache can be improved. Talking to the file system can also be slow, especially if we're running on cloud infrastructure where the disk might not actually be on the same machine that your application is running on. We typically think of disk access being instant because the latency between your application and the disk when it's on the same machine is measured in a few microseconds, whereas it might actually be one to three orders of magnitude longer on cloud infrastructure. If your application uses a database of any sort, you may need to take into account the performance characteristics of that database. Some databases optimize for read over write performance. You might have indexes that you need to take into account and how those impact both read and write performance. But how do we distinguish between two concepts, these two concepts here, when we use such a loaded term like performance all the time, saying it's faster if we do it this way. It doesn't actually communicate what you mean. The half of the people that you're talking to will probably choose one definition of one of these two definitions, like how long it takes to run once and one will choose the other, which is how many times it can run in a given time period. So when we discuss how long it takes to do something once, we can use the word latency instead, instead of performance. And when we discuss how many times you can do something in a given time period, we can call that throughput. And this gives us some vocabulary to kind of be more concrete about this nebulous term of performance. And which one of these matters to you is based entirely on your needs at any given time. There are a lot of factors that could make you choose one or the other. If you're concerned about, for example, the volume of data that you have to process versus your capacity to process it, then you might want to focus on throughput and optimize for throughput. If your customers have to sit there while something happens, latency might be the thing that you want to focus on. And even then, there can be diminishing returns on that. For example, if you take, if it takes two to 10 milliseconds of additional latency to offload some particular operation to some other machine, that might mean that you take 60 milliseconds instead of 50. But, and if you're looking at raw numbers, like that's a 20% increase in latency. That's gonna seem slow. But if you're freeing up 50 milliseconds of processing time on that machine, then it's actually a throughput gain despite being a latency loss. And nobody besides a professional StarCraft player is gonna notice that extra 10 milliseconds of latency. So 50 milliseconds, also that 50 milliseconds of CPU time might not seem like a lot, but when you're doing it on the scale of millions of those jobs, that's when it starts to add up. And this is actually the kind of basis of distributed computing where you start delegating tasks off to other machines. And notice I mentioned at any given time, this means that once you've chosen a primary metric that you're concerned about, it's not set in stone. You can't really optimize for one thing at the expense of all else forever. A lot of times it comes down to a matter of scale. And if part of your system that receives the most traffic, like in a lot of applications, this is things like a lot of large systems. This might be your user identity or authentication service, whatever you call it. If that has enough capacity to handle even your largest volume spikes, then you likely wanna prioritize latency over throughput because adding throughput capacity doesn't actually net you any benefit. When that's no longer true because your marketing department has just started crushing it recently, then you may need to start optimizing throughput. And so that's our communicating about performance. The fact that I stumbled through that is hilarious because I'm talking about communication. But that's, so there are a lot of different ways that we can understand communicating about performance. And a lot of that is about removing ambiguity. Because ambiguity, like if you wanna optimize your communication, then removing ambiguity is one of the biggest factors in that. And so next we're gonna talk about trade-offs. And there are a lot of different trade-offs that you can make. You might be trading CPU consumption versus RAM consumption. You might optimize for small data sets or large data sets. You might look at caching versus recalculation, like caching a result versus recalculating it. Latency versus throughput, like we talked about. Readability versus throughput. We're gonna go through that in a little bit. That's gonna be great. Performance under load versus performance at idle because those can be very different. And so the CPU versus RAM thing is a pretty huge deal in data structures and algorithms discussions. If your application's performance depends heavily on a few algorithms that are just used everywhere, then it might be worth looking into whether those algorithms optimize for space or time efficiency, which is really just a fancy way of saying it optimizes for RAM or CPU. Maybe you choose between, for example, one way that you might have to choose between those is maybe if you process a file line by line from disk, this can be slower, depending on a lot of factors. If to process line by line versus read the whole thing in one big blob and then split it on lines. Another possible choice between these two are memory allocation within Ruby itself. So within MRI, masses Ruby interpreter, memory allocation is done by asking the operating system for entire 16 kilobyte pages of memory. And with some exceptions, all of your Ruby objects live within one of these pages. MRI allocates and garbage collects from these pages rather than calling the underlying system calls, known as malloc and free for allocation and freeing up memory. Because those are slow, those are slow to keep calling over and over. And so rather than doing it for each object, what Ruby actually does is it allocates a whole batch of RAM for you and then works within that batch on its own. Because it can optimize that, it can't optimize those malloc and free calls. And that's a trade-off that they've made because even though it ends up using more memory, and a lot of that memory ends up tending to be pretty sparse a lot of the time. And from when you're external to the process looking in, it might look like you're just using gobs of RAM. When really a lot of that RAM is just reserved within your Ruby interpreter. And so that's a trade-off that they've made. Memoization is another big thing that usually this isn't so much of a trade-off, but almost more of a technique to keep from having to recalculate every time. And memoization is just like a fancy term for caching at the object level. So rather than running the same computation, you actually, you run the computation once, store it in an instance variable. Next time you call that method, you just return the instance variable. And that's one technique that you can use. One of our next trade-offs that we're gonna go through is optimizing for small versus large data sets. And this is something that we're all gonna get wrong a million times from now until we retire. So we might use something that scales well for large data sets, but if we're running it on a small data set, it might not perform well. Or we might use something that's optimized for small data sets, but then it doesn't scale well when we have to run it in production against data sets that have millions of elements in them. So these three things are important to keep in mind when working with any amount of data in production. One is that most of your data sets, most of the time, are going to be small, but not all of them. And even some that are small most of the time may not be small all the time. And so we typically might have to do something, we might have to take a graph like this where we're trying to choose between two algorithms based on how they perform at different sizes with different data set sizes. Say one of these two algorithms performs really well with a small data set and one of them performs really well with a large one. And there's a very clear intersection point in there where they cross over. And so you have to think about like do I need to optimize for the small case or the large case? What do I use the most in production? Another question that might come up and should come up in these discussions is why the one that performs poorly with small data sets matters. And in some cases it doesn't matter. But if the large optimized one is used a lot with small data sets, then you can actually impede your performance in production. So which do you end up choosing? Like which one do you want to use? Just use both. So when we look at this at that intersection point, we can see where one outperforms the other and we can, on data sets that are smaller, we can use the blue line. Ones that are larger, we can use the green line that scales better. MRI does this internally in several places. For example, when a hash has three keys or fewer, three keys or fewer, it stores them inside of a flat array instead of storing in them inside this complex data structure that it uses internally for hash, what's the word? Key resolution. To resolve a key into a value. With a sufficiently small hash, it just iterates through until it finds the key. It doesn't actually use the constant time lookup. But once you have more than three keys inside of a hash, it starts to use the more efficient hashing algorithm because iterating through a hash with 150 keys not gonna be as efficient as iterating through a hash with one or two. It also, when you call sort on an array, you might be running one of two different hashing algorithms for that array based on the array's size. Under some threshold, I don't actually remember the threshold, sorry, it'll use a merge sort. Whereas on a larger array, it'll choose to use a quick sort instead. Just because at least in the implementation that exists within MRI, like at those sizes, those particular sorting algorithms typically work better. Big shout out to Vita Hijoshi for walking through the C code in MRI to help me figure that out. So the first time, the next thing, I didn't even put a transition in there. All right, cool. So the next time, the first time you use hashing, it might feel like you found the answer to all of your performance problems. So caching versus recalculation is another big thing that we might wanna optimize for. And so it can feel deceptively powerful when you're caching. Unfortunately caching is not free. There is a trade-off involved in caching. For example, we talked about in-memory caching earlier. The cost for that one is memory. If you don't have the necessary RAM to spare, then in-memory caching maybe isn't the right move. And if you're using remote caching, this one's always fun, remote caching costs a little bit of time. And this is an odd statement because you're probably using caching to eliminate the time cost. But caching is more about mitigating. In fact, all optimizations are about mitigating costs than eliminating them. But caching especially, if your app's network latency to the remote cache is 10 milliseconds and it takes 10 milliseconds to calculate that value, have you saved any time? Fun fact, maybe, maybe you did. It turns out that like, king time to the cache server isn't the only factor in determining this. So the cache server, while under load, might take several milliseconds to return a response, turning that 10 milliseconds into 20, for example, when it would have only cost you 10 milliseconds to calculate from scratch. But it could save your application from spending 10 milliseconds of CPU time, different from wall clock time. I probably should have had distinction in there earlier in the slide deck for that. CPU time versus wall clock time, if you can push off CPU time to something else, like you can eliminate that CPU time cost, it might be worth the wall clock time cost for throughput reasons. So if CPU time is precious in your app, then that might be worth the additional network latency. And so when it comes to caching, like the most cliche joke that we have in software is that there are only two hard problems remaining in computer science, cache invalidation and naming things. And off by one errors. So if invalidating, another fun fact about this is like if invalidating your cache is a matter of determining your cache key, you're actually trying to, these are the same problem, cache invalidation becomes naming things because you have to name that cache key. But no one ever likes that joke. I've told it a million times, no one ever laughs, it's great. I think it's amazing, but I don't know, I always laugh more at my own jokes than anybody else does anyway, so. Thank you for the three people that liked my joke. So adding caching can also add frustration as you figure out how to invalidate your cache because you don't wanna return stale values from the cache. Usually when you try to avoid the stale values, you end up invalidating it too frequently because your cache can't grow forever, you can't just continue shoving data into that cache and expect it to keep making your app fast. So you have to figure out how you're gonna delete keys. And so there are a lot of different caching strategies that your cache server might support. The most common of these is called LRU, least recently used. And yeah, so it's the standard one that comes with that you're probably gonna use if you didn't configure a different caching strategy inside of your cache server. And so the idea behind that is the longer it's been since you last tried to fetch that key, then the more likely it is to be evicted from the cache. And yeah, so there are several other ones that are really useful, another one that's very commonly useful is least frequently used. So like when you start reaching capacity of your cache, that's when these cache invalidation strategies come into play. And so your cache hit rate is another big factor. No matter how good your latency to the server is and how fast your cache responses are, the first time you're making a request to the cache, you're paying both costs, both the request time to the cache server and the calculation time to calculate the value that you then shove into the cache. So when you consider that you're doing this for every single cache key, it gets expensive unless you're returning cached values more than some percentage of the time. With rare exceptions, you might want your cache hit rate to be over 90%. Ideally, it would be over 99%. If your cache rate's below 99%, you may want to see if you're invalidating too much. If your cache hit rate is below 90%, just pull it out, get rid of the cache. Because things like stale cached values, those are a very big frustration, especially in development, because you actually don't know where those cached, you may not realize you're hitting the cache. You may think that you're getting fresh data. And so debugging that stuff, that has a cost. And so both the cache hit rate and the, what was the other one? Cache invalidation. Both of those work together, they're just as important together as they are apart. If you have a high hit rate, but you're returning the wrong data, then your cache hit rate is pretty meaningless. If you invalidate too eagerly, it'll reduce your cache hit rate and may cause you to boost up your cache capacity unnecessarily. And so there's usually a lot of tuning involved when you do use caching. So readability versus throughput is another big trade-off that we make. Developer time is not free. In smaller companies, especially early stage startups, time that your developers spend on developing your product might be your most expensive resource. Being able to read and understand that code, especially for debugging purposes, will be much more important, especially in the early days, than how fast it runs. On the flip side, if your app processes such a high volume that a developer spending a week, a week of their time, optimizing something and that gives your company enough of a performance boost that the company generates that and developers entire annual salary and increased revenue. But the code is less readable. That might be a worthwhile trade. For example, abstractions that ActiveRecord provides are awesome, first of all, they're amazing. But it can also generate significant processing overhead, especially on complex join queries. Sometimes you may need to bypass ActiveRecord, just remove it and drop into raw SQL. This can actually open the door to a different style of query, like a comment table expression, which I couldn't possibly think of how to explain in the slide, so I didn't, sorry, but they are a different type of query. But if a join query wasn't the right move, but it was the only approach that ActiveRecord provided, then maybe removing that abstraction can give you some sort of performance boost in that way. Abstractions do have a cost and sometimes while optimizing, you may need to remove the abstractions and start using some of the underlying concepts directly. Again, this is not a knock on ActiveRecord. It's really handy and for queries without joins, the performance overhead is kind of amazing. But just like every other piece of code ever written, especially by me, there are situations where it can get in your way. And so when we look at performance under load versus performance at idle, you definitely want to, when you're doing any sort of benchmarking or performance analysis, you may want to, like you definitely want to run it against the same workload that you're gonna have in production, that you're likely to have in production. Sometimes this means testing performance in production. So because the performance of a piece of code might be awesome when you're running it locally in development, but in production, you might have an entirely different performance profile. So we might end up tuning performance based on how it runs locally. But when we push it up into production, like all of a sudden it's not the same level performance. Anecdotally, at one company where I worked, a CPU bound task that ran in three seconds in development took 20 minutes in production, simply due to the fact that I hadn't considered that it was not the only task running in production. And in fact, it wasn't even the only instance of itself running. So you need to keep your production workload in mind when you're performing any sort of optimizations. And so that runs through our big three concepts that we want, overarching concepts that we wanted to talk about today. So in summary, when you have a complex system, like there are no simple explanations for performance in a complex system. Things you do in one part of your system can affect things that happen in other parts of the system entirely, even if they don't seem related. Not all optimizations are created equal. There's no silver bullet to performance. The answer to every question when it comes to production level performance is it depends, what else are we doing? There's, yeah, so there's no one size fits all solution. And for every optimization, there are scenarios where it doesn't actually make sense to perform certain optimizations. Latency and throughput are separate metrics and it can be important not to conflate them. Caching is somehow both the best optimization ever and also the worst optimization ever. Production workloads are vastly different from what we see in development and we need to keep production workloads in mind. And for all optimizations that we apply, we need to measure, tune and then measure again. We wanna record metrics in production for any optimizations that you wanna apply. If at all possible, do them early before you think you need them so you can have some kind of historical data to go on. You'll probably just wanna throw money at different metrics reporting services like Labrado or New Relic. Get as much data about your app's performance as possible before you start making any sort of decisions about optimization. On that note, that is all I've got. Again, I'm Jamie Gaskins. Please feel free to talk to me about any of the things that you saw today, any other questions or comments that you've got. And I think that's all I've got. Thank you so much, everybody.