 live. Thank you for joining us again in this new year. Hope everyone's doing well. Without further ado, I'm going to jump right in here. Holly Cummins has agreed to join us today to talk about polar bears and something about Java, and she's going to tell us all about it. How you doing Holly? Yeah, very well. A little bit about Java, a little bit about performance, but you know mostly about polar bears, right? Mostly about polar bears. Polar bears are a good topic. Especially when it's cold. It's cold and gray in Atlanta right there. If you can't see up my back window here, it's all snow. Would you like us to interrupt you with questions and chat or do you want to hold them till the end? How do you feel? Yeah, let's interrupt with chat. I mean, you can interrupt me with questions, you can just interrupt me. I'm happy either way. Okay, yeah, we'll probably have a bunch at the end, but if somebody puts something in the chat, we're happy to interrupt your train of thought. Yeah, I mean, that's better than just, you know, a big block of me talking. So, I don't know if we've had you on before. Maybe a little bit about yourself and then before you get going. Sure. So, I'm Holly Cummins. I've been with Red Hat for about two years now. I work in the Quarkus team and so I do all sorts in the Quarkus team. I work a lot on ecosystem and I work a little bit as well on performance and looking at the various aspects of Quarkus performance. But I've had sort of quite a mixed career. I've been a consultant. I worked on application servers, but one of the things that I did when I was first starting my career was I worked as a performance engineer on the JVM. So, my job was to try and make garbage collection go faster and, you know, sort of tune the JVM's garbage collection itself. So, I sort of stuck with that performance theme through the rest of my career. Oh, that's a good background. How does the Quarkus thing work with performance? Is that something on top of what you're doing? I mean, on top of the JVM when you're thinking about that or is that like just the compiled stuff? I'm sorry, Jeremy, I cut right over you. I do it all the time. So, it's like a horrible bad habit. So, how does that relate to Quarkus? I guess is where I'm gone. So, with Quarkus, I mean, we do a lot of work on performance with Quarkus because going fast is one of the things that we really want Quarkus to do. Quite, and I'm not the one who's doing, I should say, I'm not the one who's doing the main Quarkus performance work. There's some really clever people, like Francesca Negro, who are just making improvements all across the board. But a lot of what we end up doing when we look at Quarkus performance is actually looking at all of the libraries that we're built on and how they interact, but then also just what's going on in the library. So, we end up doing a lot of performance-related pull requests to places like Netty because Netty is somewhere in the Quarkus stack. And so, if Netty goes faster, we go faster. But then, of course, if Netty goes faster, then quite a lot of the world goes faster as well. So, it's sort of, I think it's a really nice example of the power of open source that because we're built on these open source frameworks, we have the ability to make them go faster and then we benefit and everybody else benefits as well. That's a very cool point. Yeah, because Netty is pretty ubiquitous, right? Yeah, yeah, completely. Yeah, what I was going to say earlier, I had a, so I was at Boy Scouts, right? I had a conversation with one of the other dads waiting there outside of Boy Scouts. And he brought up this thing that a long time ago, like performance really, really mattered because on the mainframe, if things chewed up, you know, resources, you paid more money for that. And we, you know, in the Java world, you know, in most of my career, we never had to worry about that, right? Because memory was cheap. Yeah. And like, you just use all these servers with a lot of memory. And now that people are moving back into the cloud, memory is a thing again, right? And you're looking at your resources. Yeah, completely. You've just like anticipated my first eight slides or so. But it's exactly there you go. But you're right that there is always, there is always a trade off, because it used to be that hardware was incredibly expensive, and developers were relatively cheap. So you wanted to get the most out of your hardware you could. And you're willing to spend a lot of development time to do that. Now, hardware is pretty commoditized, pretty cheap. And developers are pretty expensive. So the balance has shifted that because, because writing performance code takes that optimization effort. And then it's also maybe not as readable, maybe, well, in some cases, you know, when you go for those really hardcore performance optimizations, it's maybe not as readable. And so then you do kind of say, well, I need to be pretty strategic in where I optimize. And where I don't optimize because I don't want to have this hyper optimized mess of a code base that nobody can maintain because developers are expensive. But I'll just I'll, I'll, I'll scooch in actually, because I have, I have a slide that sort of, you know, on this exact thing, because although, although as you say, I think the balance has shifted now. And we, we maybe think of optimization as something that we do a little bit less, because our frameworks take care of it for us, our platforms take care of us for it. It still is really important as, as people making things, we, we still do have to be making sure that by whatever technique we do it, that thing performs in a reasonably good way. And the reason is because as developers, we, we build things to be used by people. And people get cross if stuff is slow. And, and we can really sort of quantify even small performance impacts can have quite a a big sort of knock on effect on the experience of users. So like many years ago, Google found that if they had half a second extra on the time it took them to return search pages, the traffic dropped by 20%. So that's, you know, a huge, huge drop. And Akamai did some research, and they found that if they had an extra 100 millisecond latency on their page loads, then the conversion rate was 7% lower. So again, you know, we're seeing this sort of direct commercial impact, this direct direct money impact from performance. And both of those studies were like five or 10 years old now. So I think now user expectations are so much higher. And you would never tolerate half a second or extra on your search page. You'd just be like, I have other search engines I'm going to buy. So the impact would be a lot more than 20%. And then of course, you know, there's sort of, so those are kind of indirect commercial things where we make users annoyed and then they, they take their business elsewhere. But of course, for some industries, there's like a really direct correlation between the performance and the money. So if for example, you're working in high frequency trading. And of course, if you're working in high frequency trading, hopefully this isn't your introduction to performance optimization, hopefully you've already got it sorted out. But those high frequency trading platforms, they are so sensitive that just, you know, like a 10 millisecond delay in the trading platform will give you a 10% drop in revenue. So it's super strongly correlated. So then you say, okay, clearly optimization matters, you know, I can, I can tie this to the thing that my company gets measured on, I should care about it. But what, what exactly is it? Fundamentally, optimization is making it go faster. But, you know, that's just like four words and it sounds really easy. But it's not actually that easy, because then you have to sort of ask, follow on questions, like, for, for who is it going faster? And when is it going faster? You know, what time of day is it going faster? And what are they doing? Because although performance and optimization, it sounds like just one thing, there's actually a whole bunch of different things that we could be trying to optimize. And of course, if we optimize the wrong one, then we just wasted a ton of time. So performance can be throughput. And I think that's probably what most of us in maybe initially think of when we think of performance. So that's like your transactions per second. It can also be latency. And so that's things like for an application, maybe it's the startup time, or definitely we think of latency in terms of the response time. And then also for something like Java, when it first starts up, the jit hasn't really done its thing, and it can be a bit sluggish. And that's true for actually all applications, but it's just more noticeable for Java. So we also need to think about things like ramp up time. So if you get up to full speed faster, that can improve your throughput, it can improve your latency. So it's sort of between the two. And another aspect of performance that matters a lot is capacity. So that's how much of the thing can I do. So that's, for example, something like bandwidth. So and I, of course, when I do this talk online, I always think a lot about bandwidth and I care a lot about bandwidth because, you know, sometimes my bandwidth isn't high enough. And then the whole talk is a bit of a disaster because there's no bandwidth. But in a cloud context, another aspect of capacity that we care about a lot is footprint. So if I'm paying for one virtual machine, how many of my applications can I host on that virtual machine? Because that's going to affect my costs, like you said, Jeremy. And another sort of related thing that you might not think of as a capacity thing is CPU usage. So if my application takes 100% CPU, I can only run one of it on a machine. If it takes less CPU, I can run more of it. But I mentioned it gets complicated and it gets really complicated. So when we think about capacity, we want to lower our CPU usage because we want to make sure that we could fit as many things potentially fit as many instances of the application on a box as possible. But the other thing that we probably want to optimize is our utilization. And so to optimize our utilization, we want to increase our CPU usage. And so sometimes what what performance engineers will be doing is trying to drive the CPU usage of an application up because that means that they're using the system more efficiently. So it gets complicated. And you know, the other thing about these things is, you know, just like capacity and utilization where you seem to have these contradictory goals. For a lot of the others, there's a tradeoff. So we kind of like a lot of us will sort of instinctively expect that, you know, maybe if you increase your or improve your latency, then your throughput might go down. So if you have something like natively compiled Java, that's often the tradeoff that you're getting is that super rapid ramp up time. But at the expense of maybe lower throughput. And these tradeoffs are, you know, something that we've been dealing with as an industry for a really long time. So I don't know if you've heard the quote, never underestimate the bandwidth, but which they mean throughput in this case because it's so confusing. So never underestimate the bandwidth of a station wagon full of tapes hurling down the highway. And that, you know, was back from 1981. And I always thought this was like, you know, just 1981. So that could be VHS or a tracks. So yeah, that's a pretty funny saying. I never saw that before. How did you never because I'd seen it before. But I always thought it was one of those things that, you know, people just make up. But it's actually based on a true story. Because there was two NASA sites in Texas. And they had a data pipe between them. But that data pipe got broken. And so they switched to to communicate between the two sites. They had this like NASA station wagon. And they put all of tapes. And they're like physically driving them down the highway. And you sort of think, oh, well, yeah, but that was 1981. But the thing is, this is actually still how a lot of data is transferred in 2024. So when Google transferred data internally, and when it's, you know, these huge volumes of data, they don't send it on a wire. They physically put it to a physical medium and they, you know, carry it to the other place. And yeah, and same for Netflix, because again, you know, Netflix, they're they're a data volume business. And so when they're doing a lot of their internal data transfers, I mean, obviously, when you stream a movie, they don't, you know, they no longer come to you with a CD. But internally, they are still transferring the media on physical media. Because, you know, like the thing with sneaker net is great throughput, terrible latency. And now I can't help myself but interject a dumb story. So in 1992, we used to stand in the data center and there was the spindle on the, you know, the mainframe. Think of it that way, not not just tape cassettes. And we take the tape reel and see how far we could stand back and throw it and see if it would land on the spindle. So you accepted the mainframe never goes down as a challenge. Oh, yeah. I've got a mainframe going down story next, actually. Because the thing about performance as well is like, even if you get it perfect, you have optimized this thing to within an inch of its life, you know, you know, you know, I need, you know, you figure out what you need it optimized for you tuned it so carefully. And then like requirements change, because the world changes. And, you know, this is exactly what you were talking about at the beginning, Jeremy, that things change. And so we, some colleagues of mine helped a client a while ago, it was a South African bank, and they had this mainframe system and it was designed for the tellers to use it at the end of the shift. And it worked great for that. So the idea was that, you know, they had a small number of tellers and at the end of the shift, they push a button and then it would fire off the request. But then there was this handy API. And some other people in the bank had a mobile API. And they thought, hey, we could connect our mobile system to this handy API on the mainframe. But of course, the number of mobile users was hugely more than the number of bank tellers. And the frequency of transactions was way, way more. Because instead of being once at the end of a shift, I mean, you know how we use our phones, right? It's just scroll, scroll, scroll, refresh, refresh, refresh. And so this app was just sending requests back and forth and back and forth. And so it was just this sort of, you know, bombard. And when we talk about performance and SRE and that kind of thing, we sometimes say slow is the new down. But of course, down is the old down and it's the worst down. So this mainframe, it just completely collapsed. Because the, you know, the original system design, it was totally good for the context in which it was designed. But then the world changed and it hadn't caught up. And, you know, now we're seeing this really big change because you may have heard of the cloud. And in the cloud, you know, as Jeremy said, there's this really strong correlation between how much memory you use and your cost. So all of a sudden footprint is something that we need to care about a lot. And this means that, you know, maybe that may affect, you know, your sort of a whole bunch of technical choices that you make. And so I used to, when I was a performance engineer, I worked on the J9 VM. And many years after I left that team, it was open sourced, which was really cool to see. And then shortly after it was open sourced, I started to see all of this stuff on Twitter where people were saying that they had switched to J9 and their performance was way better, which, you know, I was pleased by because I used to work on J9 performance. And we always knew it was a really performant runtime. But if you dig into it a little bit more, what it actually, what had changed was that cloud footprint thing, which is the characteristics that J9 had were, you know, so it had, I think, I think it had a lower footprint. But as well, what it had was it had this faster startup time. And so in the cloud, you care about footprint. And you also really care about startup time. And so J9 started faster than hotspot. And it had, yeah, it had that middle chart, it had that much smaller footprint after startup. But there's always the tradeoff, right? So if you look at the right hand chart, what you can see is it got to peak performance faster, but peak performance was lower. And we see a kind of a similar tradeoff, but like turned up to 11 with GraalVM. So with GraalVM, you can have this astonishingly tiny footprint. And you can have this astonishingly fast startup time. But the tradeoff is that you will probably have slightly lower throughput. And so Quarkus has done a lot of work to integrate library so that they're easy to use with GraalVM. And so I heard a story, this was before I joined the Quarkus team, that we were at a conference and we were at a booth. And one of the booth staff was I think maybe a project manager or a product manager or not super, super technical. And so every time someone would come to the booth, they'd say, hey, do you want to see Quarkus? Do you want to show you how fast Quarkus and GraalVM starts up? And the person would be like, yeah, cool. And it was only at the end of the day that they realized that they had never actually thought to shut down any of these applications. And they had 120 instances of this application running on their machine. But the cool thing was it didn't actually matter, because the footprint was so tiny, their machine just carried on. I'm thinking how this is relevant to a problem that I faced last year where a bank, to use your example, needed to start things up. Certain use cases were first in the morning. Most people check their balances around lunchtime, but aren't actually doing any banking. They're just checking their balances so they can go to lunch. They might make it go to an ATM, do a withdrawal, and then do their online banking later in the day. But that shifts based on what time zone you're in. So how could they prep, have most of the services actually off and then prep by warming them up, ready for everyone checking their balances thing, and then tune them back down, shut them back down later in the day. And do it based on time zones, based on use cases across time zones. So that saves a lot of money. Yeah, completely. I think it's something that as an industry, again, we need to be going towards is just supporting much greater elasticity. And there's a few ways you can do it. So like, if you have something serverless, then that's ultimately elastic, but then there's some trade offs with serverless. And a pattern that, so the other thing that you could do is you can have like an auto scaling algorithm, but it may not respond fast enough. You can use, like you're saying, Rob, when you know the pattern, and you know it's at lunchtime, you don't know what time zone lunchtime is, but you know you are going to see these peaks and troughs, then you can have something that's a bit more predictive either through machine learning or through manual training. But another pattern that I really, really like is kind of like the equivalent of cloud bursting, except for for sort of workload run times instead of hosting region. So the idea is that you think you're going to run most of your workloads on Quarkus on JVM, because Quarkus on JVM is going to have, actually, yeah, so we've got, with Quarkus on JVM, you still have a better startup time than you would if you weren't using Quarkus, and you still have a better throughput than you would if you weren't using Quarkus, but then you also have this better throughput, and then the developer experience is better as well. So it's sort of this really nice all-rounder, but it's not going to be the same as GraphVM in terms of its, you know, how tiny the footprint is and how fast it starts up. But you have the bulk of your sort of, you know, your day to day on that. And then you have a sort of a scale up and scale down with Quarkus on JVM. So the idea is when you get a spike in workload, you can respond to that spike so fast, because GraphVM starts so fast that you don't need to keep that capacity always up. The sort of the one caution with that, which is kind of interesting, is that like what we find is GraphVM will start in like 15 milliseconds. It's just ridiculous. But then sometimes the sort of the surrounding framework, like your serverless framework, or you know, your whatever, is actually going to have a much slower startup time. And so even the Quarkus sort of then isn't the bottleneck, but you still do need to to deal with the bottleneck in terms of your elasticity. But often even like, even if it's a second, you know, that's enough to allow you to have a good enough elasticity. Well, so and so a lot of one thing we often talk about with GraphVM is the ability to compile down to native code, right? But does this, I mean, is it fast enough so we don't really have to worry about that? Or does native give us much more of an advantage? Um, I'm not, I'm not sure really. Okay. Yeah, it's a good question. Let's come back to that one. I'm going to interject there just from an edge perspective, because I do a lot with edge customers. You know, usually they're not on Java, but from an edge perspective, if I can compile things and have a smaller size and a faster startup, I get two things. One is when I when I want to do an update or push something down to the device, I'm not using bandwidth, right? And I've had customers where it takes an hour to get something updated because they're pushing something from the US to Singapore or whatever, right? Yeah. Then the startup time is important because normally on an edge device, you think of it as I have software that needs to be up and running all the time. Well, that is true for the main thing. But if I want to say filter data, upload data, check for updates and do some other things, I might want to do that when my main event isn't happening, right? So I might have 20 minutes window to do something to upload data or whatever, and I want to leverage the resources I have available at that time. So I'll start something up fast, run it like a batch job, and then shut it off before the 20 minutes is up because I know that the event, a train going by or something's going to happen, right? Yeah. So and size matters there and startup time matters. And I know I'm going to shut it down when I'm done because I'm going to reclaim those resources for the primary software that's doing its job. Yeah. So edge, that would be a big important thing when we talk to people doing stuff there. Yeah, definitely. I always get really excited by Quarkus on edge because they just seem like such a natural fit. It does seem. Yeah. Yeah. And there's sort of like, I think for some scenarios, you definitely do want to be going to that girl VM. But what we see a lot of times is people get really excited by the girl VM because it seems so miraculous. But then they look at the trade-offs and they're like, oh, well, actually, this isn't really the trade-off I should be making. And with Quarkus on JVM was kind of amazing. And what's completely counterintuitive is that there is no trade-off because you kind of like wait for the, you know, okay, well, which am I trading off? And it's like, well, it goes faster and it's smaller. So like, why wouldn't you? And I think normally when you know, when you're doing performance tuning, it is always about the trade-offs. But I think there is occasionally this magic scenario where actually there is no trade-off and what you've found is you found waste. And so you're able to eliminate the waste. And so that's what a lot of Quarkus is doing. And of course, then, you know, well, what is waste? Why is this waste here? And the waste that Quarkus is getting rid of is that Java historically has been super optimized to be really dynamic. And in the cloud, like you're not patching your server live in the cloud, you're just pushing up a new image. So you don't need that kind of dynamism anymore because we make more decisions at build time. And so, you know, then that really is waste because there was this trade-off of dynamism versus performance. And it's like, well, we don't, we don't need the dynamism. So we have, we have somebody from chat who's kicking who suggested that we, if it's a longer running process, you also get better throughput on the JVM. Yeah, for sure. Yeah, for sure. So your ramp-up is probably going to be faster with, with native, but then, you know, the longer it stays up, the more... Did you see who that was from, Holly? I didn't know... It's from Eric DeAndrea. Yes, nothing but a troublemaker, right? Hey, Eric. So just to sort of, you know, summarize, because this is a question we get a lot in terms of, you know, choosing which Quarkus you should use and which is faster. You know, you've got your stack, you've got your application and then your Quarkus and then your GraphVM, or, you know, you could have your application in your Quarkus and your OpenJDK for ephemeral or service scenarios, definitely go with GraphVM. If you're running your application for a long time, exactly as Eric says, maybe go for, for OpenJDK. But the other problem with requirements is that, like, often, you know, we think we know what the requirements are, and we actually don't, so then we optimize for the wrong thing. So for example, how often do we optimize for the behavior at idle? And when you optimize for the behavior at idle, what you're really optimizing for is the application that you worked so hard on not being used. And that's depressing. So, of course, nobody wants to imagine that they're writing a thing that doesn't get used. So we don't optimize for that. But realistically, like, we probably should, because about 30% of virtual machines out there in the world are what's known as zombies. So these are just machines that are being used, and they haven't done anything useful in the past six months. So really, you know, when you think about that kind of scenario, I mean, it's a depressing scenario, but the lighter you can make your application, the better it is if it's depressingly not used. So how do you, assuming you want to optimize, what do you do? So again, you know, a bit like making it go faster, this seems really easy. The thing to do is to find the bottleneck, and then you fix it. And, you know, what could possibly go wrong. There's a whole bunch of pitfalls here. The first pitfall, interestingly, is intuition. I love intuition, but performance optimization is not the place for ideas. You've really got to be guided by measurements. And it's more than that as well. You've got to, you know, you can't just measure things. You've got to measure the right thing. And what that is, is what your users care about. But it's really easy to just measure things and then assume that we're data driven. And this is called the Mykonomara fallacy. I was super interested when I learned about it, which is basically like if you, it's a good trick to use actually, you know, and meeting to that kind of thing, if you can show numbers, even if the numbers don't really make a whole bunch of sense and you measure the wrong thing, people will assume you know what you're talking about because you have numbers. We all have this kind of, you know, we love numbers. But measurement theory is really interesting. And, and there's this whole idea of leading indicators and lagging indicators. And this is actually something, this is like a business thing, a management thing, but it's super useful for us as techies. So the idea is that the lagging indicators, these are the things we care about, like how much money is my business making? They're super easy to measure. How much money is my business making? I just look at what revenue is coming in. But the only problem is they're hard to change. How do I make more money? And then we have leading indicators. These are really easy to change. I should, you know, buy everybody MacBooks. I should, you know, whatever. They're, they're predictive of a thing we care about. If I buy everybody MacBooks, then we'll make more money. But the problem is that they're really hard to identify. You can't figure out what is the thing that you should be changing to change the thing that you care about. So it's tricky. And I wanted to to show some examples of like how these work. And I should say like with performance, you know, we all want to be guided by other people because we, none of us have enough time. These performance experiments are for entertainment purposes only. You could try these at home, but please do not use these to tune your production system because your context will vary. Your answers will be different. So way back 15 years ago, I did an MSC and my MSC thesis was about garbage collection. And one of the things that drove me crazy was that there was, you keep seeing this bad advice. And you see it in like books and everywhere that says, ah, to optimize your application, what you want to do is you want to reduce the amount of time spent in garbage collection. And this is just such bad advice because garbage collection, it's an investment. And so garbage collection can make your application go faster. And you can see this, like just by, you know, trying things out. And so back in 2007, I did this performance experiment. And the two lines here are how much time you're spending in GC. So with the blue line, every single time it did a garbage collection, it compacted. That is a ridiculous thing to do. The over the GC overhead in that scenario was huge. But you can see when you look at the throughput, the blue line has a slightly higher throughput. So even though we're spending like twice as much time in GC, we actually made the throughput better. Because compacting rearranges the heap and it makes object access faster. So I thought I will recreate this experiment for this talk. So I went and I found the day trader benchmark. And then I've wired it up with Jmeter. And I started doing my experiments. So the first thing I did was I set it up like I had done back in 2007. That didn't work. So then I made a few changes that didn't that didn't show me what I wanted. So I tried setting the heap. That didn't work. So then I tried changing, you know, I tried setting the heat to a different thing. And no matter what I tried, the performance stayed exactly the same. And of course, this is totally cheating. This this is exactly bad science. When you do an experiment, you should not be trying to adjust the experiment until it shows what you want to show. That's not science. That's yeah, that's cheating with numbers. So eventually I kept, but I didn't care. I just wanted to write my talk. So I kept changing things around. And eventually I got a setup that showed the effect that I wanted to show. So I had, you know, just fairly simple command line options. And what I could see is in the first scenario, I spent 21 seconds in garbage collection. So it was about 4% of my time was in GC pauses. In the other case, I spent 12 seconds in GC. So it was about 3.6% of my time in GC pauses. So I'd lowered the GC overhead. And then you can sort of go on and you can look and say, Okay, why was the garbage collection overhead lower? It was because in the bad case, I collected 24 gig of garbage. And in the good case, I collected 13 gig of garbage. So I spent twice as much time in GC because I did twice as much work. And at this point, you should be starting to think, Okay, actually, I'm not sure lowering the GC was such a good idea. And if you look at the transactions per second, you can see exactly that I have my throughput. And I have my garbage collection. So and again, you don't know which way around the correlation is here. But, you know, definitely it's not it's not what you expected. And this is a super good example of lagging and leading indicators. So naively, you think that GC time is a leading indicator for the thing that you really care about, which is the lagging indicator, which is the throughput. But if you try and improve your throughput by lowering your GC time, you're going to be going in the wrong direction. It's not it's not the right leading indicator. But then as well, you probably have to think a bit more deeply about it to and say, Well, wait a minute. Is that even the right lagging indicator? Like here I've optimized transactions per second, because it's what I care about. Well, it's what I think I care about. But maybe I should be optimizing latency. Maybe I should be optimizing something else. So you know, you do these experiments and you have these numbers. But you know, you're not necessarily optimizing the right thing. And I, I haven't as well said what I changed in order to make that really big behavior difference. When you're doing performance testing, everything always says you cannot run your, well, you must not run your load on the same thing, on the same system as the application that you're testing, because otherwise, the load generation consumes resources. It turns out, in 2024, laptops are, this was just my laptop, which is again a terrible place to do performance experiments. My laptop was so fast that the bottleneck was the network. So the bottleneck was the network off to jmeter on a different machine. As soon as I put jmeter on the same machine, my throughput doubled. So again, I wasn't really measuring what I thought I was measuring at all. I was measuring the impact of the network, not anything to do with, with GC. And again, this comes back to this idea that you've, you've got to find the bottleneck as your first thing. Because if you're making improvements somewhere that isn't the bottleneck, in this case, the bottleneck was the network. It didn't matter how I tuned my GC. It's just an illusion. The other thing to really bear in mind is that like time kills all performance advice. Even my performance advice, you know, it's not going to be the same now as it, as it was then. So for something like ZGC, you know, that's, that's new. And it has these pretty amazing pause times and a pretty small trade off. So we always used to say, don't do concurrent GC, because your performance trade off just isn't going to be okay. With ZGC, it is a pretty okay trade off. But there still is a trade off because your memory consumption is a lot higher. And I don't have good numbers for how much higher the memory footprint is with ZGC. But I know with Java 21, they added generational ZGC. And that reduced the footprint by 75%, which tells me that if you can reduce the footprint of something by 75%, you weren't going to like what it was before. It must have been pretty big, but I don't know. So I sort of covered a bunch of things there, but you know, just to, if you take things away, GC can be an investment. It rearranges the heap that can make your application go faster. You need to find the bottleneck. Don't trust performance advice, validate it independently, and make sure you're measuring the right thing for you. And that brings us on to the next bottleneck, which is advice. All of us want to learn from experts. And unfortunately, that means that we read the internet, which is not necessarily the place to find experts, particularly with time. So the internet is just filled with bad advice. So like people, you can still find things that say, you know, in Java will make one big method because method dispatching is slow. That hasn't been true for like 15 years. You will find people saying reuse your objects to help the garbage collector. Almost always, this is a terrible idea because it really, really hurts generational garbage collectors, which is what you should be using. And then you find things like, if you want to tune your JVM, use this command line, and they just get cut and pasted. And Ben Evans has this lovely story from when he used to work at New Relic, because they could see all the command lines that were run. And for one of the parameters, you know, we always tend to do it like a power of two, 256 or something. And for one of the numbers, it wasn't a power of two. And they worked out that what had happened was one person somewhere on the internet had made a typo. And that had propagated to something like 17% of the JVMs that they had in production, this wrong number. But, you know, we all want to do it. And then you get advice as well, like, you know, you should never concatenate your strings with plus equals. You have to use a string builder. And then this is where it gets really complicated, right? Because that is actually sort of correct. But it's sort of not correct because now the JVM will optimize it. And the optimizations are getting more and more clever. So again, you know, you have to, you can't just trust these things that you read on the internet. You have to try it out. Because time ruins all advice. And this plus equals is the sort of optimization that a lot of us really like doing. So like if we get a code review or something like that, and we see our buddy has done plus equals, we'll be like, oh, yeah, you should use a string builder. But we don't necessarily know if it's actually just going to make it faster. We just think it will. So I decided to try it out again. I did another slightly dubious experiment. So everybody knows that string concatenation is slow if you do it in a loop. So I made a huge loop, and I did loads and loads of string concatenation. And I went through the my my benchmark, and I found a place where it was doing string manipulation. And I changed it so that instead of doing a little bit of plus equals, I put it, you know, in a huge loop. And I tried out my benchmark expecting to see the performance fall. And there was no difference at all. None. So I was really depressed. And I tried to figure out what had gone wrong. It turns out the method that I was worsening, deoptimizing was a two string method. It never got called. So it didn't matter how crappy or how amazing the code in there was, it never got called. And like we do this, we do this everywhere. It's just such a natural thing that we want to optimize. And we tend to optimize the things that are easy to optimize, even if they don't necessarily have very much impact. So, so like I think a lot about carbon. And when I when I travel places, I will always try and take public transport to and from the airport. And sometimes it's a lot more expensive. Sometimes it's a little bit scary. But you know, I kind of think, well, every little helps. But actually, like the carbon footprint of the plane compared to the carbon footprint of me taking a taxi to the airport, it's so small that even though there is still a tiny benefit to me taking the bus, every optimization that you do is taking your time from another optimization that might have more impact. So there's probably things I could do with that time that would have more impact. And it also psychologically, you know, it makes you feel like I have optimized, but you didn't actually measure to see whether your optimization was having an impact. And often, you know, going back to the conversation that we were having at the beginning about whether since developers are expensive, we should be spending time in optimization. The answer is often we shouldn't. Because if you have code like this, the JVM probably, oops, sorry, if you have code like this, the JVM historically hasn't been able to optimize it. But I think now it maybe can code like this, where it's not in a loop. The JVM will optimize it for you. The JVM will turn it into string builders under the covers. And so the like the people who write the JVM, they have so much more time for optimizing than any of the rest of us do. And so you want to take advantage of the optimizations that they're making. And the best way to take advantage of that is to write really average normal code. Occasionally, you will have to write code that's icky. So like with Quarkus, we found a really interesting bug with instance of checks. And we had to get rid of a whole bunch of instance of checks and replace them with much icky or code, because we'd identified a JVM bug. But like that was because we were working at a really low level. And because we had the data to tell us that this was something we had to do. Otherwise, normally you shouldn't. And again, you have to be guided by the data. So in Quarkus 2.0, we actually went through a similar sort of thing. We took out a whole bunch of plus equals and we replaced them with string builders. And it was because with a fairly an older version of Graal VM, there was a performance bug that we found, which meant that all of a sudden all of those plus equals were a performance disaster. So we had to go through and take them out. But again, that was a very specific scenario where we had an outside change that meant that we saw performance regression. And so we fixed it. So you've got to be guided by the data. Another trap that I think a lot of us fall into is this love of the shiny. And so for example, for us in the Java world now, the shiny is Lume. And we all want to go to Lume because we imagine it's going to fix all of our performance problems. Lume is really good at waiting. So switching an application to use Lume if the application does a lot of waiting is a great move. Lume tends to not be so good at CPU bound tasks. So if your application is CPU bound, Lume is not going to give you anything. And the other thing is that Lume can interact really badly with libraries. So one of the sort of the gotchas with Lume is that if a virtual thread, which is what Lume gives you is these virtual threads, if it gets pinned or if it does a long CPU process, everything on that virtual, everything sort of on its carrier thread will grind to a halt. And what we're seeing is that some libraries will end up pinning threads because they weren't designed for Lume. So for example, our colleague Mario Fusco recently put in a performance fix to Jackson going back to what we were talking about at the beginning about you know, sort of interacting with the ecosystem, because they pinned the thread and then that interacted really badly with Lume. So do try Lume, but just be aware Lume is not a magic faster thread. Lume will only help you out if thread management is your bottleneck or if you need some sort of better programming model for thread management. And again, you know, at the risk of repeating myself, you have to measure, don't guess. For Lume in particular, a useful measurement tool is that there is a JDK option to trace the pinned thread. So the pinned threads are the things that are going to absolutely kill your application performance if you're using Lume. So you want to turn that on at some point in your testing. If you're using Quarkus, you have an option so you can have an at should not pin annotation on your tests. And that will give you that nice fast feedback cycle that you get the failure at the unit test stage without having to read all of the traces to say this has to not pin because otherwise it's a performance disaster. If you did want to know more about Lume, my colleague has written a whole bunch of really good articles about it. That's worth looking at. But we still haven't really talked, we've talked about all of what not to do. We haven't talked about the what to do. So when you're doing any kind of performance optimization, you're going to need tools. Historically, we've talked about performance tuning tools. Now I think there's a really nice intersection with observability tools because you cannot demise it unless you can observe it. So observability helps a lot. Performance tools, they fall into a few categories. So you can have things like your method profiler. So things like visual VM or mission control or IntelliJ has a nice profiler built in. If you're using OpenJ at day nine, there is a nice profiler called health center, which I mentioned because I wrote it and I'm proud of it. More generally as well, flame graphs are super useful for optimizing. You're also going to want to look at your GC gain a tool that a lot of people still use is GCMV, which makes me happy because I wrote that one too. You're going to want to do some heap analysis probably. The de facto tool for that is Eclipse MAT. If you want sort of a more of a one-stop shop for all of this, then you can go and you can look at a category of tools called APM. So that's application performance management, I think. Most of these, unfortunately, are not open source, unlike a lot of the others that I just mentioned, but there is one called GlowRoot that is open source. I haven't used it personally or you can go to something like your new relic or your app dynamics or your Dynatrace. Another cool trick that I mentioned just because I read it yesterday and I thought it was cool is that there's an easy way to generate flame graphs, which is using the async profiler and the AP loader and JBang. So if you can start your class with JBang and then you can just point the Java agent at it and then you'll get your flame graph. So that's nice. And this was, if anybody's doing the billion row challenge, this was the method that Gunnar recommended for doing that kind of micro-optimizing of those small applications. Of course, we all know that micro-optimizing is something that we shouldn't do. And in a microservices world, if you optimize just one application, I think that probably counts as micro-optimizing. So we do kind of need to look at distributed tracing as well so we can get the big picture. So there you're going to want things like Zipkin or Yeager or now all of those have really been subsumed into open telemetry because you do need to have that whole system context to know what to optimize. And you also really need to think about the outliers. So there's a thing that performance experts don't tend to look at the averages. They tend to look at the distribution and how bad the distribution is because that's where people get really angry. It is when they have a really long wait. And of course, the other thing is that when we're in the cloud, if we have something that's just a little bit wrong in terms of its performance, that can be a really big cloud bill. So I can breathe. So while you're taking a breath there, yeah, tools like Sysdig and Data. They use something called EBPF. And EBPF, you know, extended Berkeley packet filters, people think of it as networking, but I can use those, those tools use something called the user level statically defined tracing so that I can catch all the GC events in your JVM and user space basically in Linux anyway. The point is, maybe you can answer this now or at the end, what are your thoughts on those kinds of tools? Because even though the tool itself runs blazingly fast because they're really designed, EBPF is designed to catch kernel events. What you're really doing is, you know, you're hooking into and seeing every little event that's going on inside the JVM. And if you're running something like on a Kubernetes cluster, you think of each host node now running many JVMs in those containers and think of all the things that you're hooking in order to do that. And is there a penalty for that? Is there, you know, you know, what is the price for observing something? I mean, like when I'm debugging stuff, most of Jeremy's bugs are, you know, Neil's boar. The physicist, most of Jeremy's bugs are boar bugs. You know, they're easily repeatable. All mine are hyzen bugs. I can't repeat anything because I'm observing it. So, but when I'm doing performance analysis and I hook something in, invariably observing the performance of it impacts the performance of it. So what are your, you know, what are your thoughts on that? Yeah, I mean, it's a really tough thing. And I think there's sort of the answer falls into two categories, you know, same as everything. One is that there are tradeoffs. So at some point, you're going to have to decide how much do you want to have that observability, how much do you want to have that really low friction access to performance information, because that's going to have a big technical benefit against how much you want to go fast, because that has a technical benefit. And then the other side is look for the low-hanging fruit. When you can get something for free or almost free, take it. And so, going back, I showed Health Center really briefly on one of the slides. And one of the things that it did was, I thought was so cool. It wasn't my idea. But the JVM is already profiling your application in order to know what to optimize with the JIT. And what a lot of profiling tools will then do is come on and add an extra hook to recollect that information. And usually, they will have to, because that is so expensive, you have to narrow down where you're going to look. We can't collect across the whole application. So let's just look in this package. But that's going back to that first pitfall of intuition of, I don't trust the developer who wrote that package. So I think that package is where the performance problem is. So I'm going to only profile in that package. So then you don't know all of the other things that might be the bottleneck that you're missing. So the broader the net that you can cast, at least at first, the better. And so with Health Center, because it was reusing information that was already being collected and just exporting it out of the JVM. And that did have a cost, but it was only a really small cost. It meant that you got your performance information for free. So where you can do that, you want to do that. But sometimes you can't. And then you're stuck back to trying to do those trade-offs, also to try and get your feedback loops as small as possible, which is a slightly different thing. But I was just looking at a talk recently, and I haven't watched it yet, but it was talking about getting incremental functional regression testing just into your daily workflow. So that if you are catching these problems really early because performance testing is so quick, then that means that you can maybe afford to have a bit less observability in production because you have fewer things that escape to production. Whereas if you're only discovering your performance problems in production, you're going to have to invest a lot in terms of overhead in having that full observability rig in production. So Eric, I'm going to change subject here for a second, but Eric was answering some questions about some perceived optimization. And earlier, you spoke about old wives tales about things that used to happen in the past. And I've seen people tell me, don't put curly braces. If you only have one line of code in your loop, don't put curly braces because it's faster. Or in order to optimize a switch statement, put the things that are probably going to get called first at the top. That's kind of like nonsense optimizations at this point, but from my perspective, but are those the things you're referring to when you were talking about stories that don't hold water over time? Yeah. I think a lot of these things, they're either appropriate for one context, but not for others. So I saw it talk once and I can't remember it was either like it was Martin Fowler or Grady Butch or someone like that. And they were talking about how they were optimizing their code to be lightweight. And by lightweight, they actually meant physically light. They needed to minimize the number of bits because each bit had a weight and this code was getting shipped to the moon. So for that context, they really had to do things like making the variable names short and all of those things. And actually for JavaScript, then you still have that same sort of thing. But for this really specific scenario, there was optimizations that they wanted to make that for the rest of us we shouldn't worry about. So that sort of one category is that we sort of look at what people are doing in a really performance sensitive context and we apply it to us who are not in that context. And then the other thing, as you say, is that stuff changes. So probably once upon a time there was a compiler where braces would cost you cycles in terms of your execution time. But that was a long time ago, but we still kind of have that tribal memory of that. And then we pass it on even though it's wrong. And then things change fast as well. Like the JVM, every version, the performance characteristics are different. So things that optimizations that the JVM didn't use to make five years ago, it does make now. So you just have to always be guided by measurements and really invest in having that performance testing set up that is really idiot-proof and easy to use so that you can validate these things instead of having to kind of go off the, oh, well, it's too hard to test performance. Or I can only test it in a microbenchmark because microbenchmarks usually give you the wrong answer. So I'm just going to go off what I read on the internet. And one other thing before we jump back to the bears. So what about Lambda? Somebody in chat asked about Lambdas. Are Lambdas less performant than writing old-fashioned, fully full classes or putting that in other methods or classes? Oh, that's a good question. I think that's probably another one where things change. So I remember when Lambda's first came out, some performance experts were looking at the performance of them and they found that they were pretty, or certainly you could use them in a way that was pretty catastrophic for performance. And there were some fun talks that went round of like, look at this disaster that happened when I switched to Lambdas. I think probably a lot of that has gone away now and there's a lot more optimizations. But yeah, I do sometimes wonder that as well. Like when, you know, it's sort of inflating a stream and then deflating it. And, you know, is that, and it's going to depend on context a little bit. So for some things, when you can deal with a stream, you're actually going to be better off because you were never getting to the end of the stream. And so if you, if you deal with it as an object, that whole object has to be sort of constructed, whereas if you deal with it as a stream, then you save a bit of pressure on the garbage collector. So for some cases, you're definitely going to be better off with a Lambda and a stream. For other cases, I suspect you'll be worse off. But it's probably going to be quite hard to find a good answer. Because when you look for it, you will find answers from 10 years ago when there was all these performance disasters for Lambdas. But yeah, super good question. I haven't thought about the performance of them for ages. So I don't know. I try, I will not keep this from bears any longer. I know, I know. Yeah, because we talked about bears at the beginning, and then I talked and talked and talked, and we didn't get to the bears. And then I finally got to the bear slide. And then we just kept talking. We didn't talk about the bears. So where, where do the bears come in? The bears come in because I don't know if, if, you know, kids everywhere are the same. I suspect they are. But mine will not turn off the television. So they'll leave the room and the television's going and it drove me mad. And so eventually I slightly snapped. And I told them that if they left the TV on when they weren't using it, they were a polar bear murderer. So it wasn't one of my proudest parenting moments. I don't necessarily recommend that as a communication technique because they were slightly traumatized. But it definitely is true that, you know, maybe not that specific thing, but it is true that, that we have a climate situation and that as IT, we definitely have a part to play in the problem, but then also in the solution. Because we tend to think of, you know, sort of flying as, is super bad for the environment, but data centers use about the same amount of energy as flying. And so we need to be thinking about how we can reduce that energy usage. And so when, when we're doing performance optimization, that is almost always also carbon optimization. So going back to that framework about leading and lagging indicators, the, the carbon is the lagging indicator. That is the thing we care about. Carbon is hard to measure, but there are some leading indicators that are easier to measure. So cost is a leading indicator for carbon, and performance is a leading indicator for carbon. And I want to just give like a case study of, of what we saw when we measured carbon with quarkus, just to sort of see how, how those leading and lagging indicators work. So when we wanted to look at the carbon impact of quarkus, we didn't have, we didn't have the measurements, but we did have an older experiment that we done where we looked at the cost impact of framework choice. And so we could see, you know, that if you run quarkus, you can run in a, on the cloud, you can run in a smaller instance. And so then that means that your cost per month is much lower. My colleague, Clemall, he did this experiment and he wanted to really set it up as like a real world experiment. So instead of doing like a microbenchmark, he had it running for 20 days and then he got his cloud bill and that was the end of that experiment. But we were able to, from, from that, using some data sets that allow you to map from your instance and your region to your carbon, we were then able to get, you know, an estimate, but a pretty good estimate for the carbon impact. And we could see that again, you know, with quarkus, your cost was about a third and your carbon was about half of what it would be if you were using another framework. So it's, you know, it's not a perfect correlation, but it's a pretty strong correlation and lowering your cost reduces your carbon. And so this, this correlation, it's called the economic model. And so it tells you that the cost and carbon metrics are, you know, going in the same direction. But we wanted to do some more precise experiments that didn't rely on this data set. So we did it on-prem because then we had access to the full instrumentation to allow us to measure energy. And we, we looked at quarkus on JVM, quarkus on native, the other framework on JVM and the other framework as a native application. And on this, the lower lines are better. So quarkus on JVM has the lowest carbon. And the other thing to note on this chart is that, like some of the lines add to before the others, the shorter lines are, they sort of, that's where the throughput maxed out. So like on native with the legacy framework, we could not get above, you know, 9,000 transactions per second or something. So, so you can really see that the length of the line is a really good measurement of the throughput. And you can see that there's a gain that really strong correlation that we thought native would have a lower carbon footprint than JVM. But because native has that lower throughput, it actually had a higher carbon footprint. So, you know, the higher line is the worst carbon footprint. The shorter line is the lower footprint. And there's the correlation. And so I've called this the Vroom model. And in hindsight, this was a really terrible name because the SEO of Vroom is awful. And so every now and then, you know, people will come to my website and they'll try and look up the Vroom model. And because it's got a nondeterministic number of Rs and a nondeterministic number of O's, they can't find it, which I've sort of slightly fixed now, but it took me a while. So naming is the hardest problem in computer science. But the takeaway is that Quarkus on JVM, it has the smallest carbon footprint because it has the highest throughput. There's this really nice correlation, which is good because we want to use the thing with the highest throughput anyway. So, I mean, just as a general principle, right, we shouldn't be wasting electricity, we shouldn't be wasting hardware. And so if we do the performance optimization to not waste these things, then that makes us overall good. So, you know, I think optimization is something that all of us can do and should do. And it can be really entertaining when you do your optimization and be guided by measurement. And optimization does have a cost, so only optimize what matters. Don't get sucked into the micro-optimizations. And with that, I think we've got, well, in fact, we don't have any time for questions. You went way over, but we can have some questions anyway. That's great stuff in this presentation, like multiple things that I think I have takeaways here. So really cool. I think we'll jump in and chat some things. One from Holly, maybe, that's an older video just from a few months ago on sustainability. And another one from our friend Marcus Eisley over in Germany on sustainability. Yeah, he's been doing some really good writing on the subject. Yeah. And Holly, my wife bought me a book with that title, Naming Things the Hardest Thing in Software Engineering, because I helped her do some code a while back and she didn't like the way I named things. It is completely the hardest problem. Sorry. So thank you very much. If there's any other questions, we'll find you. And everyone has access to Jeremy and I's email. So I think that they can send us something if they need to. So thank you for... Yes, thanks for chatting. And yeah, thank you very much. Really, really great talk today. So thank you. My pleasure. And we'll tweet and post our social media when we have a recording up available. So thanks for joining us and happy new year.