 that's the only reason I do this. I get paid by people who do this, DevOps, tools, online, observability monitoring, stuff like that. That's the advert finished. Right. I do some open source stuff. Actually everything we've worked does as open source, but I do some other stuff as well. And computer program optimization, making things go faster is like a passion is basically my video game. Some people sit and they shoot at zombies for several hours. I count cycles and make things go faster. So that's where I'm coming from. I've been working in Go for three and a half years now, and I'm going to tell you some stuff. So let's see. What have we got in the room? I guess, who read the abstract actually of the talk? Okay. So you actually know the punch line. So you could leave now and make room for the, I don't know. It's not much of a, not much of a, huh? So yeah, who's, everyone's working with Go, right? Nobody's putting their hand up. You're lazy. Who's heard of Prometheus? It was mentioned a couple of talks ago. Yeah. I'm going to use examples from that. You don't have to know anything about it, but I just, and I always like to know who's actually heard of the work I do. What am I going to cover? Stuff, stuff, stuff. Yeah. Okay, let's get going. So the three most important things. So everyone's got a paper and pencil. Oh, the slides are online, so you don't really need that. Anyway, so the most important thing you need to do when you're optimizing your programs is measure things. Do not start optimizing until you know what it is doing. That is the most important thing. The second most important thing is to still measure. And the third most important thing is to keep on measuring. So do, always measure, measure, measure. Never just go change the code because you think it's going to go faster because most of the time you're wrong. You're just going to waste your time. What do I mean by that? Measure, first of all, measure big things. You know, people show up. Sometimes they post online or whatever. They have a profile where something is taking like 21 milliseconds or something like that. And, you know, who cares? Unless you, unless you're a high-frequency trader, I used to work in electronic trading. I hate those guys. So, unless you're actually, you know, paid by the microsecond, measure big things. Also, the tooling and go, like, the profiler samples 100 times a second. So don't tell me about something that takes 21 milliseconds because it's, you know, a sampling error. So measure things that you could literally stand there with a stopwatch and time. And if you don't have anything like that, then run it a million times and then it will take appreciable time. So measure, measure big things, first thing, because you won't just lose the effect unless you measure big things. Measure all the time. You're going to miss it if you're not measuring. And, you know, okay, I'm standing here as a guy who sells measurement and observability systems and so on. But we, this is pretty useful to, like, have things like your CPU usage all the time. So you can see when things changed, all that kind of thing. And the other thing, I got into this particular tool is Yeager. Who uses Yeager? One. Wow, cool. So there's a bunch of tools like this, the Zipkin and Dapper is the sort of daddy of this family. And I used to write things like this myself when I worked in electronic trading. But the basic idea, let's do technology here. So the basic idea is you got a timeline, you got a horizontal bar illustrating how long everything took. And then you got a kind of hierarchy of the breakdown of how long things took within that. And the point is don't go optimizing your Go code if it's not actually in the Go code that it's slow. You know, to a first approximation, most programs will spend all of their time waiting on IO. So don't go hacking around your Go code if it's waiting for IO because it's not going to make any difference. So this talk is about after the time when you measure, measure, measure, after the time when you've looked at the traces and you're utterly sure that it is using CPU time in your Go code. Now the talk starts. Okay. So what do you do? Oh, the talk didn't start. I'm going to I'm going to draw examples from this thing, which is Wave Cortex. This is our this is our distributed time series database. And you don't really have to know anything about it other than I'm using it in two different ways. I use it to draw these charts. And I also speed it up. You know, I speed up the analytics engine, I speed up the ingression of data. So hopefully that's not too confusing. If you if you want to see the very specifics of what I've been doing, go to github.com, Wave Cortex and look at all the PRs with my name on, because most of them are about speeding this thing up. So yeah, I just put that up. It's it's a we ingest tens of millions of time series in real time from lots of different customers. And that's why we need to go fast. Right. Okay, now the talk starts. Profiling. So everyone knows this, right? I guess I don't know, hands up if you're already an expert on this slide. Yeah, maybe just lazy, maybe don't want to put your hands up. Okay, so the talk, I'm going to we're going to get into more detail. But this is the this is the simple start go read the blog. This is the bit I'm not going to cover. You know, run, run your program under the profiler, run the prof tool. And the bit at the bottom, if you do this, then you will have an HTTP endpoint where in your production system, you go grab a profile, which is really useful. So yeah, that's all in the blog, I'm not going to cover that. Here's a profile. Here's what you get out. If you run that go tool pre prof command. And so this is this is a quiz. Okay, so who knows who knows what the problem is? Okay, silence. Sorry. It only profiles when it's being tested. This is a profile from our production system. garbage collection. I made the I made the words highlighted and made some of them a bit bigger. But the this is very common that you will run the profile and you'll see these guys show up, you know, like runtime.gc drain, things like that, you'll see them in your profile. And you'll think, well, darn, you know, I don't know anything about garbage collection. So I can't, I can't fix that code. And it's pretty complicated code as well. So but do not fear. I'll tell you what to do. What is going on with garbage click? There's a question. Do we have time? Yes, I will explain. I will explain the question was, how do I know garbage collection is the problem? What? So yeah, what I mean is, if you see those characters, runtime.gc drain, in that order, somewhere near the top of your profile, then you have a garbage collection problem. And I predict right now you have a garbage collection problem because it's a garbage collected language. And unless you anyway, let's let's do the rest of the talk and then come back to me if I didn't answer your question. So this is, this is visualization. This is, if you hook your goal program into Prometheus, you can get this start out for free. And this is a sawtooth pattern, right? The memory builds up, and then it goes bang down and builds up and bang down, builds up, builds up. So it can be more complicated than that. The garbage collector kind of runs along in the background. It can be doing other stuff. But this is, this is fairly, I want you to get that idea that it's sort of doing, doing this sawtooth pattern all the time. Okay, now why is that interesting? Well, let's talk about memory. So this is sort of standard architecture of a modern processor, processor sometime in the last 20 years. And they, the processor, like the bit that's actually making decisions and doing things is something like 100 times faster than your memory. So there's a, there's a block page, numbers every programmer should know, where they actually write down what those numbers are in terms of nanoseconds and so on. But, but it, you know, it's orders of magnitude slower. And in order to just not have this thing waiting around all the time, we build, we build a hierarchy of caches, you know, typically two, this is not to scale. Typically the, the, the level one cache, L one cache is like hundreds of K and, and typically the L two cache will be like 234 megabytes. And so this, the level one cache goes at the same speed as the processor or the level two cache goes a bit slower, the RAM goes horrendously slower. So for your program to go fast, you want everything you're doing to be in the cache. So there's a picture of a cache. As you can tell, I'm a great graphic artist as well as a programmer. No, I'm not. But I, I tried, I tried to indicate that that's how this works. There's, there's different color codes. There's bits of, bits of memory, which are being cached in the cache. And they're also in the L one cache. And the process was actually only working on the memory in the L one cache. So think about that sawtooth. The action of going through everything in memory and trying to figure out what's garbage and what isn't, basically wipes the cache. No, maybe not absolutely, but, but that activity of going trolling through the memory that, and just, just the activity of allocating more memory pushes things out of the cache. If I move out of the line, the camera cannot see me. Okay. A moment for the line. Okay. So I, I, I can't do graphic art and I can't do interpretive dance. But I, I do know how to make programs go faster. I'm trying to, trying to tell you. Yeah, so, so you don't have to understand everything about how a processor works. But I am trying to get across this idea that there are, there are like technical, physical reasons in the silicon, why running around allocating memory, throwing it away and letting the garbage collector clean it up for you is going to not only slow your program down because the garbage collector is doing work, but it slows the rest of your program down because it kicked everything out of cache. So let's, let's look at some anecdotes. This is another one of the, oh, that's a memory profile. Yeah, sorry, that's important. So in, instead of saying dash CPU profile, when you profile your program, if you say dash memory profile, you will get, and then you can ask for things like the number of allocated objects. The default you get is the objects in use right now, which is not the interesting number. So you need to, you need to use a flag like this, alloc objects. And I like to do the cumulative, so that gives me a sort of a top-down view of who's been allocating the most things. So I looked at this and I see a bunch of encoding and I sort of, you know, it's not, you can go look at the PR if you want to see the actual code. But you need to start, if you want to speed this up, you need to stop creating garbage. And just as in the real world, reuse, reduce, recycle. I think recycle is basically a garbage collection. So, so don't do that one. But, but reusing objects and just reducing the number of allocated is the way to go. So a little bit more detailed profile. This one, so this one is alloc space. This is a little bit of an unusual one because there was this massive, this is actually a, this is a little microbenchmark that I did of one of the functions in the earlier profile. But we were using this library called Snappy, which is a compression library. And it turns out, if you call this new reader function, it allocates like 80k, just one object, 80k. And think of that sawtooth as being driven that much faster by that allocation. This is the timing. So I'm running a, I'm running like a real call in our, this is in our staging environment. Every, every five minutes and just timing one particular call. And, and this is the effect of releasing that one change take to using a sync.pool, which is a built into the go standard library that's for reusing objects that have a high cost of allocation. So putting that one change in, which is like, like six lines or something like that, takes that down. No, it's, it's what from nine nish seconds to like, seven and a half, something like that. It's like a 20% improvement. One allocation. I'm going quite fast because, because we're low on time. This was one, this is one where this sort routine did, did a comparison by calling this function, which did an s printf. The, the cumulative effect of this is to create a lot of garbage and, and the sort comparison is called a lot. So instead of, instead of kind of indirectly going through strings, I just wrote a comparison routine. Again, it's like 10 lines or something. That one got me another 20% improvement. So, and actually I should, I should stress. So this is a, this is a massive data store, distributed data store. This time includes the time to get it off the disk. You know, it is fundamentally bottlenecked on IO, but I was making 20% improvements in the whole call by making effectively tiny changes to the go code. So do really worry about your garbage. Okay, stack versus heap. Everyone knows this, right? The, who doesn't quite understand stack and heap and go. Couple of people. Okay. So first thing I should, I should disclaimer. It's not, it's not the law that there's like a stack and a heap. It just happens to be that way today in the implementation. And they, you know, they could make improvements. Some of the things I'm saying could become false in the future. But at the moment, it's pretty likely if you do something like this, your variable will be on the stack. And if you do something like this, then your slice will be on the heap. So what I mean, so there you go stack versus heap. Stacks are very neat. The only thing you can do is stick something on the top. So when you're finished, finished with it, you just effectively just move the pointer down. You just say, well, you know, I don't care about that stuff anymore, bang, gone. Heaps, you need to go carefully through the whole heap, trying to figure out which things you're still using and which things you earned. So the stack is enormously faster to work with than the heap. So that's why you want to watch out for things that are on the heap. How do things get on the heap? They escape. Okay, quiz time again. And I just, I sat there for half an hour making all the fonts bigger, because I saw in earlier talks that it was quite hard to read. So, yeah, which, well, we won't wait for you to maybe compile it or whatever. So how would you find out what's going on? So this is a benchmark. If you run go-test-bench, then goal will run your program enough times that it takes a second. So it ran my microbench mark like 30 million times. And it said every time around that loop, it's doing an allocation. So why is that? Next thing, dash mem profile. So we can see how many allocations. And then you can do this dash list. So dash list says show me the actual lines and put a count beside them. So I said this is the number of objects allocated. So it's on this line that we're allocating an object. And kind of doesn't look like we're allocating an object because it's a constant string and there's no pointers here. So I didn't call make. Why is this on the heap? So the next stage in this analysis is to tell the compiler to print out the escape analysis. So dash GC flags. GC does not mean garbage collection in this context. It means go compiler. Dash M says tell me about escape analysis. Dash M, dash M says no really tell me about escape. Because you can add it's kind of, it's verbose. I filtered down. This one command for that one 10 line benchmark program prints out like 100 lines. So I filtered them down. But they're all marked with which line, oops I don't want that. They're all marked with which line in your program they came from. So this is the answer here. It is a parameter to an indirect call. That's why it escaped. And the, so buff is a IO dot writer in my benchmark, which I've sort of scrolled off the top of the screen. But if, go back to the example, that's an interface. That makes an indirect call. If I'd actually said buff is a byte stop buffer, it's not an interface, it's a concrete type. And actually that parameter will not escape in that case. And the program goes a lot faster. So take that as read. Yeah, escape analysis is my point. If you can't spot from the high level stats why your program is creating garbage, gcflags-m-m. And it will tell you why the data is going to the heap. Generally, address passed out of a function, I mean that's the sort of favorite example. You can't do this in C, but you can do it in go. You can just return a pointer to a local variable. It will escape to the heap. Parameters of indirect calls, that's what I just talked about. And that's, that's the things in the brackets and the thing before the dot. That's also a parameter. My time is up. So there you go. And look at memory allocation. It's always memory allocation. Thank you.