 So anyone else has experience of working with these kind of techniques in the audience? Hello. Hi Jacim. I took some notes, let me pull that up. Hi, so I used to be in Bangalore. I used to run this meetup for a while. It's nice to be back. I hope I'm audible. At least my connection was pretty bad and the latency sucked. Some of the things I noticed is that head's request in practice is interesting. For example, one of the open source products that do something similar is Cassandra. Cassandra is like a distributed database that gives you scalability and redundancy by storing multiple copies of data. Wait, I think I'm going to turn off my video. That might help. I hope this is better. So if you actually duplicate requests for everything, that can be prohibitively expensive, even at low scale. For example, the way Cassandra stores data is to have three copies of everything. And whenever you get a request, you ask all of them. And when you wait for the first two to come back, and that's how you achieve consensus and stuff like that. On paper, this is great, but in practice, that means for each request, you have three times three requests in the cluster. So I used to work for a bank. We had about 60,000, 70,000 read per second to Cassandra. It would like become three times and that is fairly expensive. So in practice, something like Cassandra will only request make you request and only issue a background request and one of them fail. Or if it's like actually significantly lower than 99% die. Your mileage might vary is the usual case for like hedge requests. I do have some stories of like cron screwing up latency as well. One of the interesting things is that because of the way crons are scheduled, so many of them will run on the hour. So you'll have a lot of crons at like midnight, a lot of crons at like 4am, 5am and stuff like that. So we noticed that our latency would always be bad just after like the hour marks go on. So at 1pm, it's bad. At 2pm, it's bad. And the way we saw that is to give users no control on when their crons run. And that in practice worked really well for us. The way we did that was to write a Kubernetes controller that took over time when the crons actually gets run, except for like very few things that had to run on specific times. This worked for users because they were like happy that things run better, but they had no idea when it's going to run. Something else about like micro partitioning and how someone mentioned like how clients would recover where to send stuff. I have some context on this. So now I work at Apple. I work in our traffic team. So my entire job is to figure out where to send requests. So this is close to hard. Though I can't talk about what Apple do, sadly. So the way you do load balancing, the simplest approach is to throw DNS. You have like four backend servers. So you put DNS in front of it and just do like equal path or something. And that really works well. And if you're small, that's probably what you do. A slightly larger thing what companies end up doing is like something like in data plane proxies like Envoy or LinkedIn or anything like that. At my last job, LinkedIn at like pretty high scale for like two years. This works really well. If you're okay paying the cost of one extra hop in all of your network requests, but this won't work at some particular scale. For example, Google does not do any load balancing internally. They don't have any load balancers of the traditional sense at all. What they do is to have like fat clients that understand their back ends. So the only public reference of this is something called like look-aside load balancers and you'll find that in like GRPC docs. So you have like a look-aside process that kind of behaves like DNS where all the servers report to that look-aside process, their capacity and how they're doing. And clients pull that and the look-aside load balancer will give you like an entries to look actually conduct you kind of like DNS. So at like really large scale, that is what Google is going to do. You don't do anything like Envoy. And I can't say what Apple do, but it is kind of like Google. Yeah, those are some of the notes I did. If you actually want to read about like how some of the micro partitioning works at like very large scale, Facebook has something called Katran, like K-A-T-R-A-N. And they wrote a blog about how that works. That is really useful about like how you can like shed load at like tiny, yeah, like load balancing at like client-end is what people at insane scales do. But if you're talking about something like under like 10 or 20 billion requests per day, like you probably can be fine with something that's not a central load balancer. Yeah, that's it for me. You said you have some experience with Finnegal as well. Do you want to talk about that as well? Oh, so Finnegal, I used it indirectly. So linkerd1 versus Scala project that used Finnegal as, oh yeah, that's the link, the one Mira posted, that's pretty useful. Finnegal is basically like Twitter's internal RPC mechanism. Linkerd1 versus Scala project that was built on Finnegal and that was used as like a service mesh in our case. Finnegal is like really rock solid and it's pretty damn great, but it works when you have few very large JVM machines because it's fairly heavy. Like the minimum CPU and RAM you need to run something like a single Finnegal instance with like reasonable load is in the order of like one or two gigabytes of RAM and at least half of a mid-sized core. So if you're running like a very heavy architecture, you can't really use Finnegal unless you like big Finnegal into your app. So linkerd1 in that sense was extremely expensive to run for a while. We were using something like 400 cores just for Finnegal and that is pretty expensive. Linkerd2 does not do this because it's not written with Finnegal at all. It's like an entirely new implementation trust. So just to be clear. I think at Capillary way back in 2014, we didn't explore the Finnegal and we kind of figured it would be like hitting a nail with a kind of big hammer. It seemed like an over-engineering. It seemed like a not over-engineering, rather a piece of technology which made a lot of fever at that point in time. So I kind of appreciate what you're saying about Finnegal. However, that paper that they mentioned, the paper which kind of introduced Finnegal, your server as a function. I think that came out in 2010, 2011. It's a brilliant paper. I think again strongly encourage everyone to kind of read it. It's a very nice explanation of how you can represent and create servers by using simple functional programming paradigms. Yeah, so some of the stuff I did was crazy because I work for a company that run something like 2000 microservices. I don't know of anyone other than Uber and Netflix doing stuff like that. So we had to do odd things nobody did because of the other odd things we did. But for about four years we ran Finnegal with Linkety 1 in production. It was pretty great. At some point it became too expensive. So we switched to On-Y, which is like significantly faster and also so much cheaper to run because it's all C++. And now that's running in production. Also now in hindsight, all of that load looks trivially small compared to what we do at Apple. Like the iOS updates are like triple digit terabits per second. That's just madness.