 Okay, thanks for joining this session. So our next section is about choosing cloud-name technology for the journey to the multi-cloud. And thanks for joining this session once again, including the virtual audience as well. So quick introduction to myself. My name is Daniel Loh. I'm a CNCP ambassador and then a lot of stuff in track chair and a track committee member in KubeCon. I'm so happy to be here to introduce now our great speaker Darina from Form 3 as a technical evangelist. So please welcome our next great speaker. Hello, everyone. Hi, I'm Adelina and today we'll be talking about Form 3's journey to multi-cloud, some of the technologies that we use and exactly how they fit together. It's my first time at KubeCon and my first time speaking on this amazing stage. And I really appreciate that so many of you took the time to hear me. So Daniel already said that I'm a technology evangelist of Form 3, but I just wanted to give a shout out to our amazing engineering team who have done all of the great work that I have the honor to present to you today. All right, so this talk will consist of a quick introduction of why we decided to go multi-cloud and what it actually means. I'll set the background and I'll show you our previous architecture. And then I will introduce each of the technologies that have made the multi-cloud transition possible. And in particular, it'll be Kubernetes, Selium, Nats, and CockroachDB. And these are amazing cloud agnostic technologies that we'll be learning about today. All right, so let's begin. We've got a lot, a lot of ground to cover. First, I'd like to introduce exactly what Form 3 does. So Form 3 sits between our customers, which are financial institutions and the external payments infrastructure that power interbank transactions. As you can imagine, banks don't directly integrate with each other, that would be a maintenance nightmare. Instead, when they process payments in between banks, they go through external payments infrastructure that actually define the standards that we use for interbank transactions. And we make our customers' lives easier because we take care of all of the securing of actual payments processing and then they integrate with our APIs. So a lot of the decisions that we make when it comes to architecture and technologies come from some of the challenges that we face in the world of payments processing. First, we process a huge number of transactions. So we have a large volume that comes from virtually an unlimited number of users. Next, we have reliability and durability. We need to recover from outages without dropping or repeating transactions because if we drop transactions, then money disappears. And if we repeat it, then we invent it in thin air. So that's not really good, except, you know, hopefully if we gave you more money, you wouldn't be upset with us. And thirdly, there is quite a bit of maintenance. The external payments infrastructures that you saw, there's a few kind of different payments and they also update their external payment schemes, their endpoints. So it can be difficult for a client that has maybe a smaller engineering team to keep on top of all of these changes. And this is another thing that we are able to take care of. All right, so this is the architecture that we had before the multi-cloud project. And it's not the beginning architecture of Form 3, but it will be our starting point today. So we had our payment services hosted in AWS and our platform is written in Go. That's why you see the really cute gophers with their little stack there. We used SQS and SNS fan out for messaging. We used Postgres hosted on RDS as our database. Then due to regulatory requirements for the FPS payment scheme, which stands for faster payment services, we have to process transactions on least lines. And this is why we actually started with a hybrid architecture. We have two data centers hosted by our partners at Equinix. And this, the data centers used run on bare metal Kubernetes and again have payment services that are written in Go, use NAT streaming and Cockroach DB as messaging and data. And then they were meshed with Selium as well. So what the banking sector is regulated and that is also one of the reasons why it was the last to move to the cloud. And as we take on more and more clients and we have to process an increasing number of transactions, neither our clients nor the regulator want us to be dependent on a particular cloud vendor. Instead, we wanted to give our clients the peace of mind and the flexibility to run on multi-cloud. And in particular, we wanted to treat the cloud as undifferentiated heavy lifting, allowing our clients to connect to whatever cloud vendor they wanted to. And then we would achieve high availability because then they could connect to whichever payment service that was either running or that they preferred. So going multi-cloud, obviously that's gonna be really easy, right? Like what could possibly go wrong? In fact, the team identified a large amount of challenges from the very beginning. And in particular, networking and service discovery was expected to be the most difficult part of going to the multi-cloud. And we needed to continue keeping our latency down as our platform is strongly consistent. Of course, we could have opted to build our platform all over again on another cloud provider like GCP, for example, but that would be a very difficult and long project. So we decided instead to move to cloud agnostic technologies which is the story that I will be telling you today. Okay, so first off, the challenge of deployment. And in particular, we wanted our teams to have the same development and deployment experience regardless of what cloud they were running on. And regardless of whether it was the public or the private cloud. We needed to abstract away whether in which cloud you're running. So it didn't really matter what you were doing. And we also wanted to be able to not hinder them in any case, in any way. So deployment, the development and deployment should continue to happen at the same speed that they were used to. So we decided to use Kubernetes. And obviously Kubernetes is so well known that I won't go into actually presenting it to you. But it allows us, these are the main three reasons why we decided to go with it. First off, it allows automation of deployment, scaling and management of even more most difficult applications. It is cloud agnostic and deploys workloads in any public or private cloud. And that would make us, that would take the box of actually I don't really care what cloud I'm running in. And with its operators and CRDs, it is extendable. So it's a really great system for us to use. So this is what we ended up doing. Like we deploy our payment services across three cloud vendors, AWS, Azure and GCP. And we decided to opt for the managed Kubernetes offering because that would allow us to offload a lot of the work and expertise that was needed to run the Kubernetes clusters. Of course, we could have used, we could have chosen to run, to have the cloud provider provision VMs and for us to run, but to run the Kubernetes clusters ourselves. But that actually would have been a lot more difficult and it can require quite a bit of expertise. So we, from the very beginning, this is one of the things that our engineers always do, which is like to push as much responsibility to the specialists, if you can call them that, to the specialists that maintain the public cloud. And this allows us to use our engineering resources optimally and have our engineers do as much, do what they do best. So we like to push as much responsibility to the public cloud vendor while avoiding lock-in as possible. So this shows you a little bit, an overview of our stacks and environments. We have multi-cloud stacks, as you can see on AWS, GCP and Azure. And then the deployments go in order from dev to test to production. And stacks contain exactly the same services except that they are ring-fenced copies that allow our engineers to test their work end to end. Then we also have the possibility to run in bespoke accounts for data location and isolation requirements. And most importantly, this showcases the fact that we need to connect and manage many multi-cloud stacks, not just a couple. Okay, so now we've deployed some services. So how do we connect them together, right? These are some of the requirements that we had from our cloud connectivity. We needed a very quick networking solution that would form the backbone of our entire multi-cloud architecture. In particular, it should be resilient and fault tolerant and have automatic failover built into it. And the latency should be kept down since we don't want the multi-cloud platform to have higher SLAs than before. Okay, so I'm going to now give you a quick introduction to Cillium. This is a 101 session, so I'd like to make sure that everyone is on the same page. So Cillium is built on the Linux kernel technology called EBPF. And what it does is it deploys an agent alongside your nodes and your servers. And it provides all of the functionality that you see on this diagram, which I've taken from their documentation. So it'll provide network policies, services and load balancing and a whole bunch of metrics together with Hubble and Grafana. So why did we choose Cillium? Cillium provides network policies on levels three, four and seven. So it gives us a way to restrict egress from our multi-cloud platform. It is cloud agnostic, so it provides connectivity, service discovery and load balancing across the clouds, making it really easy for us to deploy it anywhere. And it has the inbuilt observability with Hubble, which was written for Cillium in particular. And we already used Prometheus and Grafana. So that was a great fit for our platform. So the way that we decided to do multi-cloud connectivity was to leverage the already existing connections that we had to our data centers. Our data centers had already highly available and tolerant connections to our cloud architecture. So we decided to use that instead. So we allocate sider ranges to our clouds and then the edge routers have very high level routing built into them. So they are able to send the traffic to the correct cloud. On the cloud side, we have gateways that can then forward the traffic to the correct service. And the cloud vendors provide native routing with their Kubernetes CNIs. And Cillium provides CNI chaining, so we're able to use Cillium together with the cloud vendor CNIs. We make sure that the addresses of our stacks don't overlap, but because we choose very wide sider ranges, then we have lots of room to grow. So this allows us to have a very low latency of connectivity between our clouds of only a couple of milliseconds. So Cillium also provides the ability to run the multi-cloud solution using cluster mesh, but that's not something that we've built in. You can read about the good folks that Cillium have done an excellent presentation about cluster mesh on their blog, which I encourage you to read or watch, and if you want to learn more about it. We might decide to mesh our multi-cloud solution in the future, but right now, it does not run in cluster mesh. Okay, so now we've deployed our services, we've connected them, and it's time to pass some information between them. So we had some requirements for our messaging system. First, it should support multi-cloud because the whole thing that we're talking about is actually deploying to multi-cloud. Then it should be persistent because again, we don't want to lose messages and we need to ensure that they get delivered. And it needs to have good Go client support because our platform is written in Go. So enter NATs, Jetstream. So I'm going to give you a quick introduction to what NATs is and then we will look into how we use it. So NATs, core NATs is very fast messaging technology and it provides very quick message delivery but it doesn't provide any persistence guarantees. On the other hand, NATs, Jetstream does provide exactly one's message delivery and it has inbuilt persistence and durability guarantees. So we are only using NATs, Jetstream. Messages are, messages, so in NATs, messages are organized into subjects and then the stream defines the retention of a given subject. Would NAT servers are then deployed and they expose endpoints and NAT clients can then go and connect to it and receive their messages either in push or pull mode. If you choose for pull mode, then NATs will deliver your messages as quickly as it can but if you decide to go for pull mode, then you can have on-demand message consumption and you are also able to patch them. So the reason we chose NATs, Jetstream, as you remember from our initial architecture, we did have some in-house experience with using NATs streaming but we decided to use NATs, Jetstream in the multi-cloud project because it has the push and pull clients that I was talking about and it also supports wildcard subscriptions which is another really cool feature. It's cloud agnostic, open-source and written in-goal so obviously the go client support was there and it provides horizontal scalability as well as exactly one's message guarantees. So it is a very fast, very fast, very scalable solution that we decided to leverage in our multi-cloud platform. So this is what it looks like. We have the data centers run in one cluster so remember that there's two data centers, not one with one in the cloud for replication and together they make one cluster and then the multi-cloud makes it another cluster. The two clouds together form a NAT super cluster. Inside a cluster, a servers use gossip to use a gossip protocol to actually send messages between each other and then when you connect to the two clusters together with leaf nodes, with the leaf node functionality which you can read more about later, then it allows us to pass through traffic between clusters and our NAT servers are deployed in Kubernetes because NAT also supports that natively. So this allows us to actually pass information between clouds very easily. So finally, we get to the database storage part and we can look at CockroachDB. We needed, we had some requirements for our database solution. It should of course support multi-cloud and if possible it should have SQL compatibility. As we were running our previous solution in Postgres, it would be great if we could still use those kind of the same kind of patterns as opposed to like rewriting a whole bunch of code as well as going to multi-cloud. It should easily run in Kubernetes and of course it should be scalable. So what is CockroachDB? So Cockroach is a distributed database and you can write and read from any node. It organizes data into ranges which Cockroach then rebalances and partitions accordingly. Each range is assigned a node that is the leaseholder and that node is in charge of coordinating reads and writes to a particular, to the particular partition. To the particular range, apologies. And it is a distributed database that is strongly consistent and is asset compliant. So why did we choose it? First, the Postgres SQL compatibility was huge. It was a huge plus for us because it allows us to switch our workloads and use the same kind of like data structures that we have from using Postgres. And it's cloud agnostic and is able to run in Kubernetes across vendors, which again is great for us. Strongly consistent because it requires a quorum to write data to ranges. So that was absolutely, that was great for us as well. Okay, so this is how our ranges are, we organize Cockroach in the multi-cloud. We, our data has a replication factor of three. So this is why what I'm representing with the different colored nodes. Each range is replicated across each cloud. Because we have very fast connectivity in between the clouds then this allows us to actually spread the ranges across clouds and then still have the strong consistency on each right. Yeah, and then furthermore, Cockroach have been great for the actual, so the Cockroach Postgres dialect has been a really seamless experience for our engineers to use in the application. So when it comes to CAP Theorem, Cockroach DB is technically a CP system because it requires a quorum to write. And, but we achieve availability with our multi-cloud setup and Cockroach is built in self-healing which rebalances nodes and catches them up once they've been disconnected. So if they've been disconnected then connect again. The good folks at Cockroach have done a nice series about the CAP Theorem and how it applies to Cockroach DB which you can of course look up as well. It's a really cool series. Okay, so this brings us to the end of the exploration of our platform. But we can leave you with three rules of thumb. So when it comes to, if you're thinking about going to multi-cloud then we recommend that you test your services end to end with a variety of load types. This has been absolutely crucial for us and we use Toxy Proxy to be able to test our services in a variety of connectivity scenarios. Expect errors in retries. In multi-cloud we have noticed that they are more prevalent so make sure you design your services with this in mind. And remember to go cloud agnostic first of all but also push as much work to specialize services as possible and rely on the public cloud vendor while avoiding lock-in. So you see the new, this is the new stack after multi-cloud. So we run our payment, our go payment services with the managed Kubernetes offering of each cloud vendor. We use NAT JetStream as our event bus which runs on multi-cloud and we use CockroachDB as our distributed SQL storage. We haven't changed anything on the other side on the data center side but we might, there is a project ongoing to replace NAT Streaming with NAT JetStream as that is being deprecated. So this is the end of my talk. Thank you so much for listening. Remember to check out our podcast and our engineering site as well as our Twitter account. We have so many, we're very proud to say that we have a lot of engineers from forum three at KubeCon. So do come to our stand in Pavilion One and we can chat more about multi-cloud and yeah, just like embrace the community feeling of this conference. Thank you. All right. All right, thanks for great presentation, Elena. So we got a few more minutes for the question. If you have any questions, I'm gonna pass the mic to on. So please don't hesitate, hand you up. And then, okay, so any question? Oh yeah, here we go. You said that the client can choose on which cloud he wants to run on. Is it also possible that one client, like he said, it doesn't matter and there are multiple clouds working for him or does he actually has to choose one? So I may have phrased. So the question was about how our clients choose where they connect to which cloud provider they connect. So our clients have high availability connections to all of our cloud providers so they can connect on any payment service and then they can choose exactly which one to connect to. They don't have to like only connect to one and then that's the one you've got. And then yeah, if there is an outage then they can go and connect to the other provider. Thank you. Any other questions? Oh yeah, sure. Thank you for a great talk. I wanna ask how do you handle private keys and key sharing and secrets across multiple clouds? Is that something you have solved as well? So that's an excellent question. The question is about how we secure connections in multi-cloud. So all of the connections in between the clouds are secure but I'm not a security expert. We'll be happy to tell you more about exactly how the client side security works if you come to our stand. Thank you. All right, any questions? We have a, by the way, the mic in the center so you can just step up and then ask any questions. All right, I think it's a really good presentation and you already explained a lot of stuff and then people already understand. Okay, thanks again. Thanks for a great presentation really now today and then thanks for attending this session and hope you enjoy the rest of the Q-Code. Thank you. Thank you very much everyone.