 So I'm going to be talking about influx spout. So this is kind of ties quite nicely into the last talk. So it's going to be about a case study about building a high-performance message router with Go. So it's maybe slightly higher level than last one, but there are still a few nice bits of code in the end. So I'm only Peccalesto, or just only for short. That might be easier. I'm a production engineer here at Jump Trading, originally from Finland, been at Jump for one and a half years. And I work in kind of different areas, primarily our metrics stack, a lot of Linux stuff, and high-performance computing. And my background is largely in managing very large high-performance computing systems. And along that, working, doing some coding with C, Pearl, Python, and yes, even Fortran, which I think is a quite a nice language, actually. And well, since starting working at Jump, one of the perks is that I get to work with Go. And that's really one of my favorite things these days. So at Jump, we have a lot of servers, switches of applications, all kinds of interesting hardware. And it's very geographically distributed. We have a very high velocity, so we get new servers. We decommission them, move them around. A lot of new applications get that all the time. So monitoring, keeping track of all of that is quite difficult. And what we use to monitor it is the tick stack. So telegraph, influx DB, chronograph, we actually don't use that. We use Grafana and Capacitor. How many people have heard of this tick stack? A few? So it's quite popular for doing monitoring. And the nice thing is that it's all Go-based. And so basically, we have a host sitting here. It has a telegraph collector. It chips data from the host itself from applications that are running on it. And external devices like, for example, switches. So some hosts act kind of as an aggregator for the data. It gets pushed over the network to influx DB, which is a time series database which stores it. And then we have Capacitor, which can do analytics on the data and actually also push it back to influx DB, but also primarily do alerting on the time series data. So if we see a disk starting to fill up, we can trigger our alerts based on that and do a lot of very sophisticated stuff as well. And then for visualization, we have Grafana. And I'm pretty sure like most of you have seen this Grafana dashboards somewhere. So it's all good. We can even do things like, in the shell, just do a echo measurement name. Some count equals 10. And send it to just the local host UDP port where you have a socket listener for telegraph. Add some metadata, for example, host name, location. It can be a data center, business unit, whatever. And a nanosecond timestamp. And with this enriched data, it goes into influx DB. So that, I think, really democratizes our metric stack. So basically any of our users who has an application can pretty much push any data into our metric system. So people really love it. That's the good news. The really bad news is that, yes, go ahead. What do you mean you got rid of Nagios? Well, it's kind of complements it. So this is Nagios, we have a Nagios-style system, but it hasn't supplanted it yet. So this is more, let's say, capacitor is more for if we have a trend we see from the measurement data, we analyze that. And the Nagios type, is it working? Is it up or is it down? That's kind of a different issue. And then there's the whole log analytics, which is a third leg of this whole thing. And I love this topic. I could talk all night about metrics and monitoring. But yeah, I want to get to the go part. So this is kind of the introduction part. So bear with me, there's going to be some go stuff. But I need to make sure that you start talking about Nagios. No, yeah, yeah. I can talk more about Nagios. Definitely. There's Nagios? Nagios. Nagios. Oh, there's a G and a O. Oh, not G. So as you see, we're starting to get a huge amount of hosts. That's not all our servers, definitely, not all our switches. That's just a representative thing. And a lot of internal customers using stuff, people sending increasingly high cardinality metrics, which means that there's a lot of index keys in the data, which produce a lot of individual series inside influx DB, which then puts pressure, as you can see, in the influx DB, poor influx DB sweating there. So we have metrics with different service level agreement. Some metrics are super important. Some are more like throwaway, different retention policies, things like this. And also, we have nice users, but sometimes they send just absolute junk there. And once you have junk in your influx DB, getting it out can be difficult. And also just having people. We want people to experiment with stuff. And sometimes if an experiment goes wrong, we get a ton of data. I do it as well, so I'm just talking. So what we really wanted is this kind of magical message bus, which sends things based on some config management scheme to a bunch of influx DB servers, things which we haven't checked, kind of like a QAN product thing. So things we don't check to a junk yard. And then we graduate them to these proper influx DB instances, which then we can set up arbitrary metrics. And we also have a bunch of capacitor instances running there on the side. So it's a magic pony that kind of poops the metrics out, eats them from the host. And now, let's see. So we started developing this thing called influx Pout. So initially, it was the sort of monolithic application you were talking about. So single process, it was using Go. We paralyzed with Go routines. It worked fairly nicely. We had limited scalability, though. And if we shut it down, it was a bit disruptive. And we did have high availability using KIPA-LIFE-D. But overall, fairly clunky. Version two, we started thinking a bit more about message bus. At that time, there wasn't really a nice solution to it. So we basically did this very hacky thing where we used the multicast as a message bus. And we actually had it on a single server where we had it running on loopback. And that kind of worked. And we managed all the components with this thing called Supervisor D, which is a bit nicer than SystemD for this kind of work, where you have a lot of components. And we could add listeners. We could add writers to it. But we were starting to see some drops with bursty data. There's the limitation that if we want to expand it to multiple servers, we need to be in an environment where you have multicast working consistently, which if you're running things like Kubernetes, that might not be there working as you'd like. And also, we'd had one case where there was that multicast leak where actually the backup started to listen to the production traffic. And we got double writing because they were on the same subnet. That was slightly embarrassing. But we caught it very quickly, and no damage was done. So that came along. And then we kind of decided, OK, we'll do a third iteration of this thing. And as you said, it's a simple lightweight messaging system. We have a nice publisher-priced subscribe messaging model. And we're using the standard mode. So there's also this NAT streaming, which enables reliable delivery. But we basically are still doing this metrics kind of on a best effort basis. A lot of our collection is actually on UDP. But we find that we get very few drops. So this is fine for us. And there's also a nice Kubernetes operator available where you can build clustered NAT instances inside Kubernetes, which is nice. So I'm not going to go into NATs too much because that was covered really well, thanks. That was a perfect talk to proceed this one. So here's just an example of NAT's published subscribe model. So we have the publisher, which publishes on something called a subject. And then subscribers can subscribe on this subject. And the message gets replicated to all the subscribers. So what our influx pot version 3 looks like is that we have the listeners that take the metrics, either UDP or HTTP, publish them to the NATs bus. Then we have a component called influx pot filter, which takes this Ingress filter, according to different. You can have different rules, but we basically typically use the rule that one measurement. So a measurement might be MySQL metrics, CPU metrics. They get put into their own subjects. We could reg X based on things like data center as well. And then the filter also removes bad timestamps, which has been quite annoying. And then we have the subscribers. So for example, for capacitor, we have a thing that parses the, you know, capacitor has this domain specific language called a tick script. So we actually have a thing that parses the tick script, and then does a custom writer and capacitor instance inside our Kubernetes cluster. So that way we can quite simply dynamically set this kind of capacitor end points up. We can set up influx DBs. This one takes MySQL and CPU measurements. This one just subscribes the MySQL, and they get nicely into the influx DB. So it's kind of a wrapper wraps the NATs, which is at the heart with this influx stack, and makes it a lot nicer and manageable. Some performance metrics I took today. So we're getting, so this green one is the incoming messages, and this is the outgoing. And we are getting about 4,000-dish messages per second. We're outputting about 25,000, but there are pretty big messages. So the incoming message rate is a bit over 100 megabytes per second. And the outgoing is 300. So we're basically doing like one to four. So on average, every packet that comes in gets sent to four different influx DB or analytics end points. And as you can see, the performance is quite marvelous. So the load, it's a typical Intel Xeon Broadwell server. The load is very low. And the memory usage of the NATs server is absolutely ridiculous. It's this megabyte, so it's under, it's like 35 megabytes, so that's beautiful. And the writers are very efficient as well. So this load average, one minute load average. So we've found that we have plenty of overhead. And this is just using one NAT server. So yeah, I'm a big fan of NATs. Do you have any way of load balancing it if you ever get to levels where you see blocks? So yeah, as you can see, we can cluster the NATs, so it supports you quite easily. Cluster multiple instances. So now we're starting to use that. We're also moving now from traditional big iron to Kubernetes. So there we have a bit more headroom and we're possibly to play around. And we can do things like, we have multiple listeners. And they also all have affinity to one of these NAT servers. And then the NAT servers have anti-affinity. So all of these exist physically in different servers. So you can do very cool stuff with this, also like this microservice type environment. Another subject I could talk about for a very long time. But yeah, so some lessons we've learned along the way. So one thing that's been a pain is low consumers. Have you seen this? So sometimes influx DB gets a bit slow. So you might have things like database compactions running. You might have heavy queries hitting the back end from Rafaana, someone wanting to see the data for last two years or whatever. You might have this high cardinality matrix causing some havoc, any bunch of things. So the symptom is that the buffers become a stay full. And if that's left unchecked, it can propagate all the way through to pretty much where the metrics are collected. And one nice thing is that because we collect UDP, we don't get all the way to the clients that send the stuff. It stops there. But we get very high memory footprint. And one thing that we did initially to combat is add deeper and deeper buffers. But the thing is that if you have things that are perpetually slow consumers, that buffer gets backed up. And because if it's a go channel, it's 5.0, so you just have this big latency. So you get some stuff dropped. And the stuff that happens to come in just is late constantly. Time for Kafka. Kafka does this very well. It does. Yeah, we were considering it. But we kind of. Nets is if you're slow, active. Well, yeah. And Nets has the alert thing for slow consumers. So you can alert on the server. It does this very well. We're not seeing this anymore that much, though. So we kind of managed to fix it on the server side. And one thing is that because we shard the influx DBs to smaller units, we can prevent one influx DB becoming slow, which is, I think, the real way to solve this. But that said, that's good to know if that never happens. So because Kafka, while it's nice, I've understood that it's a bit of a beast. I haven't really used it. It's called Buffers and GRPC. Yeah. But that's, yeah. It's a beast. Yeah. And it's not programmed in Go. So that's, yeah, boo, boo. No, yeah, but yeah, that was a valid alternative. But right now, we're totally happy with Nets. But one thing is that constantly monitoring buffer size is important. Keeping the buffer size is reasonable. If the data is not super important, then just think about dropping everything on the floor if it gets too full. And then testing for slow consumer tests and having just some mock-up of an influx DB back end that gets slow and running tests against that. So I wish we'd been smart enough to do that in the beginning. Logging and monitoring. So comprehensive instrumentation is essential, as I mentioned. For example, for these slow consumers, one good tip if you do this kind of high message rate stuff is avoiding any logging on the message path. So you don't want to get 100,000 times a second a log message. Message X was dropped. That's terrible. You might want to add a debug mode that shows that. But even then, you should be a bit smart about what that logs. Maybe having several levels of debug modes. And just having an out-of-band monitoring. So the nice thing about Go is that with a Go routine and a channel, you can very nicely implement a monitor that's out-of-band. And now we're actually moving it to a separate monitor component. Also having good tooling to investigate end-to-end data path issues. So there's an at-stop utility. We wrote our own influx-pout tab, which is kind of like influx DB, TCP dump, but to just look at what's going on in a single subject. And there's the link. It's super simple. So it's a good example of how to write a very simple consumer of this NATS bus. And of course, TCP dump for looking stuff on the network level. And we're also moving this now to Kubernetes, as I mentioned. So truly having a metrics as a service. Being able to dynamically create these instances, ultimately even by our end users. Doing a Kubernetes version of a Go program. Some of the things on the checklist. I don't know, have people migrated stuff to Kubernetes a lot? One? So does this look sense? Yeah. So just ensuring that you have a capability to log to standard out. We want to keep things stateless. So we're thinking about implementing an API to be able to do reloads while the components are running. But basically because Kubernetes handles that for us, just keeping things nice and stateless. If you want to reload something, you just restart the components. Kill the pod, have it come up. Or because Kubernetes also handles rolling updates under the hood, that seems simple. Instrumenting with Prometheus, that sounds a bit funny because we're actually ingesting telegraph data. But to be honest, telegraph might not be the best thing in the world for everything. So we do have, and we will have a lot of stuff that we're not gonna put into Kubernetes. Like I don't know if ever, but not in a very, very long time. But the stuff that's in Kubernetes, Prometheus, is a very good choice. And here's the PR that we're working on it. And also just having liveness and readiness probes. Sometimes if you have a web, like some web utility, you get that for free, having an HTTP endpoint that just reports 200 response. But in some cases, like our UDP listener, you can't really call the UDP port for health. So you need to add that in. So I guess fairly simple and with Go, it's just really easy to migrate stuff to Kubernetes. So rolling your own methods, using some standard stuff. So, you know, sometimes we've found that it's worth it. So we do a lot of string processing and we want high performance. So we found that until Go 1.10 app-ending strings is something that we could do with our own custom method much faster than what Go offered. Another one is timestamp filtering. One counter example is that initially, we set up this very fancy buffer recycle. There's the blog post. Ultimately, at least in version 3, it turns out that the garbage collection is good enough for our needs, so I don't know. I mean, do people actually do these custom buffer recyclers and stuff like that? OK, yeah. Just like a lot of extra complexity. And it's just like this, you know, we see the scaling issue, which might not be there. So one thing is that just, you know, not an over-engineering up front, but just having stress test to validate these things early enough. Pretty much your optimization is bad. Yes, yes. So this is just an example of one thing we did. And actually, you know, I ran against Go 1.10. They have the strings builder, which does this essentially, I think. So, you know, we compared appending a string with S-printf. And it's about this 211 nanoseconds per operation. And with our custom line builder, it's 119. Yep. And I think I test run it now with the strings builder this afternoon. And it's about the same performance as the new one. But the thing is that this didn't exist until recently. And there's the gist where you have the benchmarks so you can test it out. Another one is this fast timestamps convert. And this, I think, will be the fastest implementation for a long time because we're just interested in timestamps. So as you can see in the comments, we can just assume that input is a byte instead of a string. We only support base 10 and only positive values. And you can do a lot of additional optimization because we're dealing with nanosecond timestamps, but we're only interested in the second resolution or tens or hundreds of seconds when we want to filter out incorrect timestamps. But at some point, I think with this, we already managed to drop the overhead down from 14% to 3%. So this is an example of where it pays off to do a custom function method. And one thing is also things like vendering. So instead of vendering in the whole influx DB, if there's just one method you want to use, just drag it in and copy it. I don't try writing in Go assembly. No. That's, I think, your prime candidate for this. Oh, yeah. You seem like a heavy duty optimizer. Yeah, but this is now fast enough. That said, the code is available online. If someone wants to give it a shot, making it faster, I don't mind. So overall, in conclusion, Go has been a very good fit for this project and this kind of moving to this microservices model. And the key things is having, being able to parallelize individual components with these Go routines, having this robust HTTP server, having efficient garbage collection, being able to keep the code really simple while getting the job done, I think that's the thing. The line count is quite small, and I don't think we've ever had to do anything very convoluted to get things working. And also, the surrounding ecosystem is all Go-based. That is great. And these high volume data flows have some interesting challenges. There's probably more that our other devs can tell. And stress testing is very useful. And here's the URL. This is actually the first project that our company has open sourced completely. And we're really excited to see what people think of it. And we have some job openings, I think, right now in the Singapore office. There's a spot open for a C++ developer. And we have a couple of external contributors that I'd like to credit who have been subcontracting on this. And we love PRs. Here's my Twitter handle. Thank you. Yes? Why didn't you write it in C? Well, the ecosystem is all Go-based. I was super tempted to write this in C, but I think ultimately it made the right choice. And it would be kind of nice if this would ever be in upstream as well. It's all cross-platform, so you don't need ghost portability. And C is a lot easier to optimize. You know, you won't have memory management. Yeah, but looking at it after the fact, yeah, it's been a very educational experience also doing this. So I think, and being able to scale and do like multi-threading very quickly also on the node. So that's my point. I have a question. At the beginning, you said you have geographically distributed services. Yes. And I have one question. How is your solution working with multiple data centers, which can be geographically distributed? Maybe one in the United States, another one in Singapore. So in practice, we haven't found that problem. We haven't found good connectivity between all our data centers. But you could technically do a multi-tiered solution where you have local NETS servers and they talk to each other. So you could build a hierarchy. But even with a fairly distributed architecture, it's been pretty good. So to be clear, you're connecting from a remote location. You're connecting to a remote NETS cluster in the general case. Yes. Wow. That's great. Well, all the NETS, all the infrastructure, that infrastructure is local to a Kubernetes cluster. So the metric stuff is coming from all over the place. The listener could be Los Angeles, for example. And the NETS cluster could be Singapore. Technically, yes. But basically, what we do is actually this whole infrastructure is in the same. I think we've tried it also. So how does the metrics come in? With HTTP and UDP, so in-flux DB-line protocol. From remote locations? Yeah. And one thing that these writers also do is that because the metrics are quite small, the UDP stuff has an max MT of 1,500. If all would come to the in-flux DB directly, that would be pretty crazy. So they aggregate the stuff into the request. No error correction, no error checking, no reliability. Well, we do. The first thing that we do is we do it here in the kind of the capacitor and the analytics. And so we're dropping stuff to see it. But it's been pretty solid. And in our network team, it's excellent as well. You must trust your network a lot. We do trust our network with our life. Yes. We said we'll get timestamps. Then we said, OK, but you see timestamps is actually lesser than what you already used to make the timestamps. Well, when it goes enough into the past or into the future, and that's kind of sometimes difficult because some users want to push old data. So they calculate some old data and push it. So that might be from the last week or something. So but typically, the worst things we see is that someone sends a timestamp. They explicitly define it. Because then if you explicitly define it, telegraph doesn't add it. So if you explicitly define it as seconds or milliseconds instead of nanoseconds, you get data that's from 1970. So that can cause a bit more issues, all kinds of weird charts showing up in what's needed. So this helps keep things clean. Amazing. Thank you.