 Welcome to KubeCon. Welcome to our talk on how to build a distributed system and also should you. We are very excited to be here. This is my first KubeCon. So I'm just as excited to learn as I am to sort of tell you about what we've been working on. So who are we? We are Patrick Dizil and Rebecca Bilbro. We both work for a company called Rotational Labs and Rotational Labs builds small to medium sized geodistributed systems. So these are not kind of like Netflix, Facebook, Twitter scale distributed systems, but they're distributed throughout the world and because of that we encounter a lot of interesting problems in this space and we're going to talk about some of those with you today. There's a lot of overlap between the problems that we encounter and probably a lot of the solutions that you are working on or thinking about in your work. So we're also hoping that you'll come up and chat with us later. Hopefully you'll be able to recognize us and you'll come find us and tell us about what you do and what kinds of systems that you're working on. So I would be remiss not to mention our team. So the Rotational Labs team is also geodistributed so we're from all over the world and I wanted to give a special shout out to Karen and Benjamin who did a lot of the experimentation that kind of led to us coming up with this talk. So the Geoping project that we'll talk about is something that they kind of incubated and so I want to give them a special shout out. And if you're curious about what we and the rest of the team are working on, I would encourage you to check out Rotational.app and you can get a little sneak peek. So our talk is a talk in four movements. So I'm going to start by setting the scene and telling you about a geodistributed system that we've been building and maintaining for the last two years or so. And then we're going to kind of go back a little bit into well what is a distributed system and what are some of the kind of key problems you have to solve when you are building and maintaining distributed systems. Then we're going to talk about some of the hard one lessons, the things that we messed up and we'll sort of show our cards and be honest about things that didn't go so well the first time and we'll talk about how we work through those problems. And then we're going to talk a little bit about what we think is coming next based on the experiences that we've had at Rotational Labs. What we think is sort of coming to an open source repository near you soon. Okay, so first I'm going to tell you the story of global directory service. So imagine that 10 years ago the government was not really paying attention to crypto. But in the last couple of years crypto has become kind of a thing and now governments are starting to catch on around the world and they really want to be able to audit crypto transactions, especially ones above some kind of monetary threshold. And because of that there's you know groups of organizations now around the world who are trying to form alliances to figure out how to share information to allow governments to audit their citizens. This is hard. If you know anything about crypto blockchain doesn't really support the sharing of the kinds of information that you would need in order to do auditing as a government official. There's like you know there's no way to look up somebody if there is a wallet address it's not a person. So it's definitely a little bit of a challenge and so this is kind of what we needed. We were tasked with working on this for a non-profit group of people who are trying to work in this space. And so this is what we built. It's kind of you can picture it as like a global yellow pages kind of that allows you know different virtual asset service providers to look each other up and to exchange secure private information that would allow governments to do audits if they need to. And so I want to emphasize that this is secure information. This is private information that is being exchanged right. It's like street addresses of people with wallets and maybe social security numbers or the international equivalent of that right. So it needs to be safe. It needs to be secure. It needs to be private. And so we've got a web app that institutions can use and organizations can use to enroll and there's a an approval process that happens and then after that you get identity certificates, public private key pairs and then you can use those to establish mutual TLS connections with counterparties and then exchange cryptographically sealed envelopes with that secret information inside. And we use GRPC to try to have the lowest latency possible because we don't want to slow down transactions on the chain. So the very first problem that we had to solve is where do we store the public certificates, right? So you need to be able to look up the public certificate for your counterparty. We got to store that data somewhere. Where should we store it? Should we store it in Singapore? Because Singapore is the place that sort of was the fastest moving in terms of government regulation. So maybe it makes sense to store the data there. But if we store the data there, then anybody who's doing transactions, not in Singapore, not co-located with Singapore is going to pay an extra latency fee, which that doesn't really sound fair. It's not really in the spirit of distributed ledger, right? So we need to figure out some way to distribute the system. So that brings us to what is a distributed system? So when I say distributed system, hopefully some of you already kind of have this picture in your mind, you know, of a group of servers that are sort of arrayed together in some way. They're interacting, you know, they're communicating with each other. But from the outside perspective, like from a user interaction perspective, from a client perspective, you're just interacting with one thing, right? So, you know, it's sort of this veil of, you know, a singular machine. But it's actually, you know, many machines working together. And that's really critical to distributed systems because things go wrong all the time, right? There are natural disasters, there are outages, you know, things go down. And because of that, you know, the multiple servers, multiple computers working together, allow the system to tolerate those failures and not lose critical data that's in transit. The multiple machines also make the system more available, right? So that's good for user experience because you don't have to wait a long time to get a response if you have a query. It just goes to the place that's closest to you. The way that this works is that each peer in the system has a copy of the database. So they can respond independently to requests that come in concurrently, right? So why do they even need to work together if they all have a copy? And how do they even work together? Well, kind of a side note, you know, tangent here is that they're going to work together using GRPC. GRPC is Google's flavor of remote procedure calls. And so it's a way of, you know, you can define an API with proto files. You can compile that into the language of your choice. And then you can run that code on two different machines. And because both machines don't recognize the RPC, they can actually execute commands on a remote machine. And that's how this is going to work. Why do they need to cooperate is because, you know, when users are making changes to independent peers, those peers at some point need to synchronize so that they can share data that's been added by different users around the world. And sometimes we are going to get into the situation where two users have concurrently made a change to the same object, but with different values. And then we have to figure out how to decide who wins. And that is one of the big problems in distributed systems. There's actually a lot of ways to solve that problem. And it kind of depends on what your business case is and what you need. So if you need strong consistency, if you need every peer to always have the same answer forever, then you need strong consistency. But then you're going to pay a cost. It's going to take more time. It's probably going to be more expensive. If you can relax those requirements a little bit and do something more eventual, then you might have problems. You might be able to have lower latency and higher availability. But sometimes you might get a stale read. So there's kind of trade-offs here. For us, we picked eventual consistency. It was a good budget-friendly choice. Basically, it guarantees that if everybody stops making updates, eventually everybody, all the peers will agree on what the data should be. So I want to kind of shout out also some of the things that we built our solution on top of. This is an open-source solution. We built it for a nonprofit. So it's all out. You can look at the code. We built two main packages, Honu and Turtle. They're both names for turtles. So there's kind of a turtle theme going on right here. So we built these two open-source packages. They are built on other open-source packages, like level DB, and Prometheus, and protocol buffers. But the way to kind of think about it is that Honu's job is to add metadata versioning on top of level DB so that we can compare objects and decide who should win. And then Turtle is in charge of actually coordinating synchronizations between peers. And so basically, peers will periodically synchronize. They'll compare their version vectors and then make updates according to a bilateral anti-entropy algorithm, which I won't go into, but you can definitely ask me about it after, and I will go into great detail. So I'm going to turn it over to Patrick to talk a little bit about some of the things that we learned along the way. Thanks, Ereco. So the first thing I want to talk about is, usually in communities, we're kind of trained to think about pods in terms of, like, we don't want to get too attached. Like, so there's a kettle, not pets mentality, which means that we're viewing these pods as really vehicles of execution, not necessarily any form of state. They don't really stick around too long. We're expecting Kubernetes to be able to destroy them and create them at will. But this is kind of contrary to what Ereco was just talking about, because what we really want is we want that metadata, right, when we're creating those queries to the databases. So in order to actually perform this replication, we need pods to be addressable, which is kind of easier as I've been done on Kubernetes. So imagine two cows on this ranch, like Bessie and Clarabel, they both need to be able to kind of gossip for each other and exchange their version vectors in order for the system to be eventually consistent. So what this means in terms of Kubernetes is that we can accomplish this using kind of a combination of a few different things. We can use a stateful set in Kubernetes, which is basically a way to start a bunch of pods which have an associated identity and also enforce the idea of uniqueness in pods. So we're never starting the same pod of the same identity over again when we have to read play things or change configuration. So that's one aspect to it. The other aspect is this config map, which is how we're actually getting the pods to know who they are and who everybody else that they're allowed to talk to are and how to talk to them. So what happens is that you can imagine that replicas.json thing on the left here being included in all the deployments in the deployment manifest. And what happens is that when you start up, right, so you're a cow waking up on the cow ranch, and you figure out who you are, and then you also figure out what your process idea is from that and also what region you're in, which is pretty important for some of our metadata stuff, and also what are the other cows you can talk to on the ranch. It's really important for the processor ideas also to be unique because we need to break ties, as Rebecca said, in order to ensure that the consistency is actually happening. There's another problem which we encountered, which has to do more with how the client interacts with our system. So we're thinking about building geodistributed systems. It's really important to consider where our users are actually coming from because they're coming from all over the world. So we don't want to have them give them like undue latency when they don't need to, right? So if I'm connecting from France, I don't want to go all the way over to this Singapore server in order to make a request. But there's a bit of a problem with using something like Google Cloud load balancer because we want our clients to be able to connect over GRPC for performance reasons. And this is not something that's very easy to do with a Google load balancer. So that's one issue. The other issue is TLS termination when we get to the load balancer. So what that means is that if you don't know what it is, it's basically the point at which we interrupt traffic and we decrypt it into unencrypted traffic. And sometimes this is something we very much need to do if you want to look at what's actually coming into the private network or things like that. For our case, it's actually something that we don't want because we're enforcing MTLS throughout the entire, all the way from the client to the GRPC endpoint services where our turtle might be or another web-facing service might be. So we really need to enforce GRPC or MTLS across the entire connection. So one thing that we can do is we can push the routing up to the DNS layer. So this is a way of getting around that problem right of not being able to use the GRPC with a pre-mentioned load balancer. So here is an example of how it might work where you have a client in Detroit who is trying to ping or resolve the DNS domain name. And then what happens is that we have a list of specific IP addresses and those IP addresses just go to all the different deployed ingress services that we have across the world. So that what happens is that when we have our magical DNS resolver, it appoints the user to the right IP address. They can cache it. So usually your computer would cache that information. And then when they make a new GRPC request with that geo.testnet.io, what happens is that it goes directly to the GRPC service because they already have the IP address. So this is coming to the project we built called GeoPing, which is basically, in short terms, it's just a GRPC service that knows where it is. And it's just a glorified version of what maybe the Bash or Linux ping is. So instead of like pinging and getting IP address, you ping and you get a region where the closest server to you is. The reason why we built this is that we wanted a way to actually test that our DNS routing is working since it's not really something that we're doing. We're relying on another external service for that. So what happens is that it's a pretty simple service. If you've never looked at a GRPC service definition, this is probably the most simple one that you can start looking at. It's just one RPC status and it returns some information about the server that you've connected to. And so that's something we can just configure on the server side in configuration. We don't have to worry about any routing. It's just for testing purposes. So at this point, a little demo here. So on the left here, we have a command window. And what we can do is we can ping our geodetestnet.io, like I mentioned before. And hopefully it returns back US central one. Because what happens is that the DNS resolved to the closest region, which we're closest to. And so then what's really interesting is that we can use a VPN client to connect to Singapore here. And now we're pinging back from Asia of these ones. So if we were in Singapore, we would be actually accessing the GRPC service in Asia, southeast one region. So does anybody have a favorite European country? Finland. Good choice. So connecting to Finland, if we try this. So what might happen is that my local computer here might be caching the DNS. So what we have to do here is we have to flush the cash and eat Rebecca's pass. So now hopefully this should give us a Euro bus three region. So that kind of proves that our routing works. And so basically what we did is we just tested that a bunch at all the different regions. And it turns out that Google DNS geobased routing works pretty well. And this is an example of what we're doing. We're kind of running simulations to determine like what it would look like if we had a bunch of things coming from all across the world. So you'll notice that there's a bit of a daily seasonality here on this right upper right chart here. So I think that's like the most important takeaway from that is that it is like as day becomes night, it becomes day, people wake up and go to sleep. The traffic actually changes across the world. So you might have a central peaking where Asia, Southeast is troughing, I guess. So it's really important to actually consider like more than just geography. It could be time of day and things like that that you also need to consider with respect to routing. So this is what our solution for GeoA to DNS routing looks like right now. We are using Google Cloud DNS and with their GeoAided routing policy. And this give us a few advantages, right? Because it means that we're a bit more cloud native, but we can still swap out that DNS solution for any other one that we come across. And at the end of the day, we're just running Kubernetes on Google Cloud engine. So it's also something we could swap out as well. So we have some ways of still maintaining our being in the driver seat. So now we want to kind of shift and talk about what we think the future of distributed systems is going to be. So we can return to our earlier definition, right? So Rebecca's definition of a distributed system, a network of computers working together to appear to the end user as a single computer. And that's like a very classy, like technical, like classical definition of what maybe a distributed system might be. But if we think about how the internet has changed, right, we have a lot more people on their phones and a lot more users who were not a part of the internet, like maybe 10 years ago, they now are. And the idea of what a server is has also changed a lot. So servers are kind of small and they're also more mobile now. So basically the point is that it's a different world and maybe we need a new definition of what a distributed system actually is rather than just like this monolithic archetype. So here's our new definition. It's a flow of events across time and space. So the way to think about this is kind of a more microservice-y type approach where you think about all the data that's being generated across the world in terms of events that are in time. So these events can be ordered and they can be pushed to microservices and those microservices can be pushing to other microservices and queues and things. So you might be knowing what kind of what I'm talking about if you've used any of these tools. So message queuing, PubSub, stream positing, we think that these are going to be kind of expanding popularity because at the end of the day what you need to do for distributed systems is you need to send data across around the world and the best way to do that is to kind of think about data in terms of events and not necessarily like in traditional database format. Although I do want to mention, database is obviously still very important. You still need your Postgres and your SQLites running to kind of so you can have some type of local storage. Now I'll pass it back to Rebecca to talk about what we think about some new features in these distributed systems. So real quick, I want to bring it back to open source. We're at a big open source conference. So what is coming next in open source? I think that there's kind of four things that we think are coming next. The first is having persistence turned on by default. So having persistence turned on by default means that we can do things like time travel and we can recover from disaster more easily. Most eventing solutions that are out there have persistence turned off by default and that's intentional because it improves throughput and reduces latency. But it kind of takes away some of these key features that are going to be required if we're going to start moving towards events as kind of the kind of atom of distributed systems. The next thing is kind of geographic event encoding. So for the most part, cloud solutions assume that you don't want to know about geography. That's kind of what the cloud is. You can just pretend that it's this magical thing up here and it's not actually physically anywhere but it is physically in a lot of places. And having that geographic encoding will empower us to have a better user experience or comply with local regulations where the data is. So it's going to kind of unlock a lot of features that we need. Another one is total ordering by default. So most eventing solutions now can only guarantee total ordering for brokers that are on the same node. And that is because total ordering is expensive. It requires consensus. And we talked about that at the beginning. Strong consistency is expensive. It's slower and actually you have to pay more for it. But if we had total ordering as kind of a readily available open source solution, we could kind of guarantee this more equitable global engagement with our applications. And then the last one that I assume is probably most exciting to a group of Kubernetes people is automatic reconfiguration. In fact, I'm hearing a lot of people saying those words automatic reconfiguration already here at the conference this week. So I know that that's something that's on everybody's mind. You can't just infinitely scale your DevOps team. We need better algorithms that make multi cluster more realistic for small teams which probably most of us work on a small team and not like a huge team that has infinite resources. So we think that that is what's coming next. If you want to check out a little bit more about what we are doing, please visit us at rotational.app. Or just come and chat with us. Come find us at the conference, chat us up. And we'd love to hear what you're working on and hear what kinds of things you're solving, what kinds of open source packages you're working on, or the ones that you would like to build. Maybe we can work together. Thank you. So I think this is the time when we take questions, right? All right. Any questions? Yep. Hi. For your DNS router, where were you running the check for GOP? Sorry. We had, I have like two concurrent questions which like perfectly approves the point that, you know, failures happen all the time and distributed systems. So do you mind if I answer this question first and then do the router question? Is that okay? All right. So you said we built our own distributed system and, yes, great question. So the question is we built our own distributed database. Why did we not go with a readily available commercial solution instead of doing something insane like building our own distributed database? The reason is the commercial solutions are super expensive. We're like, you know, we're working for a nonprofit that, you know, that hired us to do this. They do not have the money to spend on like Aurora or Spanner or something like that. So it's a great question. And I assume that probably a lot of people are in the same boat, right? Like to shell out money for Spanner is like just way beyond most people's operating budgets, especially now in this economy, right? So great question. Okay. So then the other question was about Geoping. Yes. Yeah, I was just wondering where you ran the Geoping check, right? For your DNS router. Like where did, where did that signal go? Like where did the, sorry, can you, can you ask your question again? So you're using cloud DNS, right? And you're routing the customer yourself, right? Using Geoping. Where do you run that check? Do you want to take this one? Sure. So I guess the question is where we are directing the client to after we are parsing it or converting the DNS to, or the domain name into an IP address? Yeah, yeah. Okay. So that's going directly to our ingest reverse proxy. I think we're using traffic for that. So, and then traffic is sitting on the Kubernetes cluster along with the actual pods, which were, which we have our JPC services deployed. So that's how it's actually getting to the service at the end of the day. Is there a question? Thank you. So in distributed systems, network partitions could happen all the time. And especially in the geographically distributed systems like you have. So how do you test for resiliency and reliability and any lessons that you want to share with us? Thank you. That's a really good question. So the question is, you know, it is commonplace in geographically distributed systems to have partitions happen, which is where, you know, two parts of the system temporarily can't talk to each other, which means that they're making a lot of updates maybe locally, and then they have to figure out how to synchronize a bunch of updates, you know, that might have been made over, you know, an hour or, you know, maybe even longer if the, if the fault happens for a long time. And so the question is, you know, in our solution, how do we kind of test for resiliency and recovery? So we actually have run some experiments where we change the anti entropy interval to see, you know, if we can speed up the anti entropy interval, right? So like speed up the amount of synchronizations that happen, we reduce staleness and it does help protect against, you know, having to do a lot of updates if there is a partition, but we don't actually have an experiment yet that's orchestrated to kind of simulate like an earthquake or something that cuts off Singapore from, let's say, Ohio or something like that, where are there, US servers, but it's a really good idea. And I would love to implement that because I think it would be really interesting to see like how long, basically to like, I think you would clock it, right? You would want to know how long it takes for the system to become synchronized in the event of a fault like that. It's a great question. One of the common problems that our team has run into in building new systems is getting it done and then figuring out what we have to test as opposed to coming up with what we want to test first, building those tools and then building the system to match it. There is good reasons to not test everything like maybe the catastrophic earthquake case that you were just talking about isn't our most pressing concern. I'm curious how those conversations went for you and your team as you were building this product. Yeah, we have the advantage of, so our CEO has a PhD in distributed systems and in fact, I think he's sitting somewhere in the audience over there. So he kind of has like textbook knowledge of all of the things that can go wrong and has kind of encouraged us to think about failure kind of the whole way through and we definitely have the mindset that we have about how we're writing. In fact, just writing everything in go sort of always encourages you to think about it'll probably, if there are equals and then whatever your function is, sort of the philosophy is like this probably won't work, but if it does, keep going. And so kind of that philosophy of just kind of continually checking for possible errors, we do have to check for things like race conditions because that's kind of a very common problem, way more common than earthquakes for sure. So that is a great question how to kind of have that conversation about what to test for, I would say concurrency bugs are like the number one thing. And we did have a lot, I mean it's a lot of like the anti entropy routine is like extremely concurrent because as one replica is kind of sending all of its version vectors to the other replica, the other replica is starting to send back requests for updates and those things are happening concurrently. So there's a lot of things that can go wrong there. Great question. So I have a question about the use of the location and how would redefining distributed systems as a flow of events across space and time change the user experience for events for apps like Uber and Tinder? So the question was how is kind of this reframing of distributed systems as being a flow of events across space and time going to change kind of how we build apps like Uber and Tinder? I actually have no, and I was like watching a video on like from a Kafka conference where Tinder was presenting and they actually talked through, in fact if you Google it, if you just Google like Kafka and Tinder it probably will, might not be the first result but hopefully it will be safe for work whatever you find. But if you Google it and there's a really great video by one of the engineers at Tinder talking about how they had to re-implement their entire Kafka flow because they didn't have ordering like strict ordering turned on initially. And what was happening is people would get dating recommendations out of order for people who travel a lot. So it was like people who are going between let's say like South Korea and London for work and they would get recommendations for a date in like the other region where they weren't. And so it is something that I think as we start to identify these bugs people are going to start rethinking kind of these like persistence by default, total ordering, some of these things are just going to, I think these features are going to become more commonplace because they support kind of a way better user experience. And it hasn't been a problem until recently because stuff was just not fast enough for that to ever happen. And now things are so much faster that we're starting to see a lot of these problems crop up in usually in user interfaces. That's a great question. All right. I think we are out of time but we will still be here. We're not going away while we might have to go like out of the room. But come and talk to us. Thank you so much for coming. Thank you.