 Hi everyone, thanks for the attending of the session talk production scale containerized game platform practice in Bad Times. I'm Chen Yu. Today I and Victor will give an introduction of this topic. It's my great honor to have this presentation. I will give a brief overview of how we're leveraging open source framework to firm our application platform. Then Victor example demonstration to show how we have a game application deployed. So at first, let me just do a brief introduction to myself and Victor. My name is Chen Yu Zhang, and I am a software engineer and researcher in Bad Times. My base is in Mountain View, United States. And my basic interest and working area is in application orchestration and the Kubernetes scheduler algorithm. And Victor is the advocate in Appabang, who is the maintainer of cross plan, one of the greatest components we're leveraging to manage different cloud provisioning results. Today's talk will be over three parts. On the first part, I will give an overview for the Bad Times and the game platform in Bad Times company. Because there is various type of games formations, and there are many aspects of the games, like the security, etc. We won't cover every aspect for other. In this part, we will only cover the part of how we can provide the containerized stability and the application orchestration of the game. In this scope, we will talk about the practice in Bad Times in second part and how we can manage the game server in Kubernetes ecosystem and leveraging different inspiration from open source for the platform. Then, Victor will give a brief demonstration in the third part for the games application deployment in multi-cloud area. So, let's talk about some background. So far, funding 2010 Bad Times mission is to inspire creativity and enrich life. It's a suite of more than dozens of products, including TikTok, Hellos, and Results, as well as the platform specific to China's market, including Touqiao, Douyin, and Xigua. Bad Times has made it easier to make fun for people and to connect with consume and create content. For the games area, we hold a big platform universe as an egress to provide games to millions of the players around the world. Some of names are in flavor with the Asian market, like the mobile legend and the One Piece. Of the games in Bad Times are using different infrastructure to host their service in the platform. For near years, as the game studios growing rapidly and the players becoming more and more, there are more and more problems and challenges for the traditional deployment of the game server. It's obvious that the skills bring some of the following challenges in the platform. There are more and more games that access the platform and the millions of players bring the heavy workload for the server. On the other aspect, the player distribution brings some problems. For some latency sensitive games to provide a better playing experience, we need to leverage in different regions, in different cloud provider, and to make sure limit the effect of the game server breakdown, the modern game server have a lot of complicated components. The result of those is the heavier work for the deployment and the maintenance for the game server. On one aspect, we need to hold a large amount of instance with complicated architecture. On the other hand, we need to manage the different infrastructure results in multi-cloud operational. Therefore, we are seeking a way for free parts of the operation efforts. It's natural for us to take an investigation into a cloud native way of deployment. The most benefits are listed as the feature below. The first is directly from the money. In the old days, we used to release the virtual machine instance manually and it's hard for management and cost lots of weight with the virtual machine. Using the containers, we can use the Kubernetes easily to control the scale so that we can reduce the cost, as well as the maintenance of highly availability. The second is that we can deploy the application stateless with some template. The Kubernetes will take all over the rest things and we don't need to consider the configurations for each game server. The third is that Kubernetes will take all the jobs so we don't need to think about manually fault tolerance and it will make the architecture simpler. The latest and the most important is that you can easily construct the application and we can provide the flexibility for the customization. Then let's talk about the practice embeddance. The optimized results of the workload is shown as the follow. In the old days, the developer needs to come up with the SRA team together and put many efforts in the deployment and the daily operations. Like the configuration needs to be rendered and sent to each of the new virtual machine. Then we also need to design a lot of self-discovery mechanisms. We also need to launch a resource in advance and it will cost a list of time and money waste. For now, the developer can only focus on the template of the application and upload the image. The system will ultimately take over the other things such as the service discovery. We can also use the runtime to provide efficient management for the games in daily operation. From the overview, as the figure shows out, we have leveraged the following four open source in our platform architecture. Letting us just discuss high-level design and we will go into detail for the introduction for the practice. The developers can define this template of the server and including its virtual infrastructure resource. Then the system will set up a trigger defined by the SRA team to distribute the application into different Kubernetes servers in the cloud provider. Then this application would be rendered automatically by the Kubelela controller into the real Kubernetes object, including the Agonis game servers workload to host the game and the cross-lay object to host different S-results. We also use the open-cruise manager to inject the system container like the file bits, etc. Let's talk with more detail. First and foremost is the container-rise challenging for the game server. As we know, from the virtual machine to container, the main value is to provide the ability of create rebound and auto-scaling. We can leverage Agonis as a workload to provide the container-rise ability, including the lifecycle management and the auto-scaling for dedicated game server in Kubernetes system. In Agonis, we can host a fleet of one pour of auto-scaling of the ready game server. When the player wants to have a server to play with and send a session connection request to the back-end platform, it will send a location request to the Kubernetes API server, which is watched by the controller of Agonis platform. Then it will allocate a game server to the player on provide game servers. The controller will also watch the status by the SDK sidecar of the game server, including the health check and player's trace. For complicated application, Kubevila is using the OAM model to distribute the controller from unmanaged workflow to managed server. User or the developer can define the component's dependency and operation rules in one place with standard. We can let the user define their own components as well as they reached and let the SI teams define the operation strategy. For example, if a user would like to define the game server application with the game and one RDS and a one load balancer, they can just define the application in one YAML files. The controller will automatically pull out the game server and its corresponding results in the target cluster. We can set up the strategy that after all the results ready, the service is exposed automatically to the player. Also, for a better operation for runtime, we combine open cruise ability to provide advanced operations, provide better operation and playing experience. We use the sidecar model for the dedicated game server for customization. For example, some game server may not need fire beads and some may need to have BI logs agent. Since the user would not want to be disturbed by the operation like the system containers update, we just use the sidecar set in open cruise to manage them so that we can do hot update or in-place update without the disturbing of the user to provide a better experience. Then for the various as-results from different cloud provisioner, we will leverage in cross plan to help us build up a unified control plan for manage the cloud resource in a cloud native way, which means we can have just create a resource in Kubernetes and it's corresponding the real world one and manage the lifecycle with the Kubernetes contents. As a practice, we can see the conclusion. It's obvious that the cloud native brings a lot of the advantage for games platform. First, we can just split a great big game server into different parts and host them in different containers to provide high availability for an efficiency for the service. Also, for preventing the cost binding for a single provider, it's naturally for the game you choose different providers and in current day, we can just use them as a normal resource in Kubernetes with great efficiency by cross plan. For the complicated application, the OAM model is a great model for us to adapt into. In the future, we will put more effort in this part, including the OAM runtime management and the cost solver for leveraging best cloud resource. To be a general review, it's become a chance for game platform to go cloud native in nowadays. That's the overview for the practice in our internal platform. Thanks to Vector and letting him give a brief demonstration for our game server. Unfortunately, we do not have time to go through the details and explanation of the whole system that byte dense is building, which is a pity because they're building some really cool things. And it would be very useful to go through the whole picture. But as I said, we do not have time for that. So right now, we are going to focus on cross plane or to be more precise, we are going to go through a practical demo of how cross plane works and how it fits into the system byte dense is building. So what do we need to do? What are the needs? And we can split those into infrastructure services and application. We might need to create some infrastructure. Let's say a Kubernetes cluster. So we need to create and manage Kubernetes clusters in plural. We need to manage our applications running in those clusters or somewhere else. And we need to manage services like let's say a database, and we need to do all those things at scale. And in a way that anybody within an organization can leverage can use the methods, the system to create their own clusters to deploy their own applications to use the services and so on and so forth. So it's just as much about managing resources at scale as shifting the left and enabling everybody in an organization to be able to manage their own things without having to spend months or years going deep into Kubernetes into cloud and so on and so forth. So here's a scenario, a simple one. I want to run my applications in two clusters and those clusters might be in different providers. Let's say AWS and CEO just to spice it up. And once we create those clusters, we need to manage some applications in those clusters and corresponding services. There will be a backend application and a database that that application is using. Let's start with clusters. How would the definition for managing a cluster creating updating and so on and so forth in AWS to be more precise EKS look like? The definition could be as simple as this. I want to claim a cluster and that cluster has a name, has some labels that identify which type of a cluster I want. In this case, that could be AWS and EKS and some parameters. The size of the nodes should be medium. I do not know what is the precise size I need in AWS, but that does not matter. And I will show you later why it does not matter. And then how many nodes do we need? Three in this case. And in this context, I mean minimum number of nodes because it is assumed that there is a cluster autoscaler that is scale it up or back down, but not below three nodes. And finally, for me to use the cluster, I need to generate cube config and I will do that in a secret called a team EKS. Now you might be confused and say, Hey, what is that thing cluster claim? Well, that's something that operator in a company created and exposed as a service. So this is a custom resource definition with a controller called cluster claim. And that cluster claim can have many different implementations. I will show those implementations or how I came to those applications later. For now, just think of this as a very simple and easy way for everybody to consume a service, even though what's behind that service is potentially complex. And if I would like to get that cluster, the cluster defined in that YAML, all I have to do is execute cube cuttle, say, Hey, the namespace should be this one, I want to apply whatever is defined in that file and go. Now if I would like to get a similar cluster, a cluster with similar properties in a different provider, let's say Civo, then my manifest could look like this. This is yet another cluster claim with matching labels that say, Hey, this cluster should be in Civo and should be CK or Civo Kubernetes cluster. There are a few parameters that are different than the other clusters simply because in this provider, we might need the smaller size of nodes and less number of nodes. But other than that, the definition is exactly, exactly the same or at least the structure of the service is the same. And I can confirm that we're talking about the same definition, even though the providers are completely different, can be done by diffing those two files. You can see here that the name is different, ID is different, the labels are different, and some parameters are different, but both are based on exactly same custom resource definition created automatically, but we are getting there. So to create that cluster, the process is the same, kubectl, this is the main space I want to apply whatever is defined there, go. Now to get to the point that we have services like this, custom resources with corresponding controllers, we need to do two things. We need to create a definition and then implementations of those definitions, which we call compositions. So the definition itself is something like this. We have open API schema that says, hey, whomever wants to work with those compositions, they should use this schema. And the schema has some properties like ID and parameters. And within parameters, we have version, node size, minimum number of nodes, and so on and so forth. A simple definition, open API schema that defines what will be the interface through which others can consume the service. And then we can have as many implementations of that schema as we need. In this case, we are using EKS and Civo, but I also have implementation for GKE, for Azure, and so on and so forth. And each of those implementations is what we call a composition. And that composition implements all the details, everything that is required for somebody to manage a cluster without really going into details of all the services required. For EKS, for example, requires 10 to 15 or 20 different resources to be combined together. But all that is an implementation detail. It is in hands of an operator who creates the service and not for the consumer. And the same goes for Civo, which is yet another composition. And it has different implementation, but it is exposed as the same service consumable by everybody. And the important part here is that you are in charge of creating compositions. You are in charge of defining what it means to have something in your organization. This is not vendor opinion, it is solution. This is a tool that enables you to create your own opinions and based on those opinions, services, and combine those services into an internal developer platform. Now let's take a look at what we got, what happened behind the scenes when somebody applied those two simply YAML files. We can see what's going on. We can see all the resources managed by crossplane by listing all managed resources. And here we have a bunch of role policy attachments in AWS, a couple of roles, a number of objects, which are Kubernetes manifest that will be applied to that cluster later on when the cluster is up and running. Because remember, it's not only about creating and managing clusters, it is about creating and managing production ready clusters. And that's not only infrastructure, but also services that are running inside those clusters. And then we have a number of releases, which is a number of health charts that will be applied to that cluster. And we have a node group and a cluster and security group. And we have subnets and a VPC and route table and internet gateway. And all those things are AWS resources. And then we have Civo Kubernetes, which is in a completely different provider. But that's how we create cluster in Civo, unlike in AWS, there are differences. But all those things are implementation details, which enhance of SREs or DevOps or operators and not consumers. From the end user perspective, all those details might be relevant. And we can say, Hey, just show me what's happening with my clusters. And in that case, we can just list all the objects that were created, which is cluster claims. And here we can see that there are two clusters, none of those is ready, because I didn't give it sufficient time. So let's do just that. Let's wait for 20 minutes or something like that until AWS creates the cluster, Civo takes a minute or two, and then we can proceed. I will not bother you waiting. I do not expect you to wait. So let me fast forward to the end of the process. And I'm back. Here we go. Two clusters are created. You can see that by the column ready set to true. So clusters are done. We can see that in case of PKS, control plane and node pool are active. In case of Civo, I made a mistake. I'm not populating those fields. So shame on me. Anyways, if you ignore that I made a mistake, you can see here that both clusters are ready. And that's what really matters. And now we can use it to deploy applications and services and a few other things. But before we do that, let me retrieve cube config, which cross please stored in a secret after it created the cluster. And then I can use that cube config to do some funky stuff. So cube couple, etc, etc, etc. And the outcome goes to cube config, EKS YAML. And now we can use one of those two clusters. And to be on the safe side, let's retrieve the number of nodes. And there we go. There are three nodes. That's my case cluster. So obviously it exists. It was created by cross plane. It is managed by cross plane with drift detection, reconciliation, and so on and so forth. And now I can use it. I can deploy my application inside of that cluster or any other cluster. In this case, I'm using only two, but normally you would need to imagine dozens or hundreds of clusters being managed like this. But before I proceed, let me retrieve cube config of the second cluster as well so that we can use both. The second one is in Civo. And there we go. I have the second cube config. Now I can really use those clusters to deploy applications. I can deploy same applications or different applications at scale or individually. I can do whatever I want because that's what framework, I mean, cross plane platform gives me. Now, let's say that we want to deploy an application and we could create a helm chart or something with the deployments or state will set and services and virtual services and so on and so forth. But I want to continue with shift left idea enabling others. So I created a sample, a demo representation of an application. This is not a real deal because, you know, I cannot show you confidential stuff, but I do have a composition. So I can just say, hey, create something that is up claim application claim. And that's something will be a backend application because in my organization, I could create different compositions one for backend front and this or that whatever number of variations I have. And I will provide it with a couple of parameters, the things that really matter and hide the things that don't. And in this case, hypothetical, what matters is the namespace, the image and the port through which the application should be exposed. And then I will apply that manifest to the civil cluster. And after that, I will retrieve all the resources inside of that specific cluster in the production namespace and see what I got. And I got and remember this is a silly demo, I got the deployment, the replica set and so on and so forth. Now let's move on. Let's make it more interesting. Let's say that we want not simply an application there, but application with all corresponding Kubernetes resources plus a database and that database in this case could be let's say RDS in AWS and all that should be connected and glued together. How would I do that without going into details? How AWS works? How Kubernetes works? How would I do that as a developer, as a consumer of a service? The definition could be like this. Now the first part, the first of the two resources defined here is almost exactly the same as the one I just applied to civil. It is an application claim. Everything is the same except matching labels. This time, the type says backend DB. So the same definition will use a different composition that will know how to apply and which resources to create when working with backend applications connected to a database. And then we have a second definition over there that says, and I would also like to claim an SQL database and that database through the matching labels should be running in AWS and it should be Postgres. Again, I can have and I do have different implementations, database in Azure, database in Google, database in AWS, MySQL, Postgres and so on and so forth. But all those things are implementation details. The end user just needs to specify labels and this is the provider. This is the type of the database. Go. I mean, it's not go. You need a couple of parameters like this is the version of the database I want. This is the size of the database. Again, I do not know the real sizes in AWS, but I will say small and you figure it out. And the namespace is production. And finally, create a secret called silly demo and put the authentication to the database there so that the application can use it. So let me apply that manifest with kubectl, et cetera, et cetera, et cetera, et cetera. You know the command and then two new resources were created this time in the EKS cluster, not in Civo. And let's take a look at what we got. If I list all the resources plus ingresses in the production namespace of that specific cluster, we can see that we got the service and the deployment and deployment created replica set and replica set created pods and we have ingress. There is nothing really special here except the status of the pod that says create container config error. It is failing big time, but that's normal. It is failing because this composition knows that it should attach a secret with authentication to the database and it cannot attach a secret because secret doesn't exist and it doesn't exist because the database was not created yet, but when crossplane creates the database, it will create a secret and this pod will automatically work because the secret will be there and it will be able to instantiate itself. If we want to see the statuses of those specific resources, application claims and SQL claims, we can just list those two and then we see that application claim is ready and the database is not yet ready, but it will be ready soon. It takes five minutes, I think, to create an RDS instance in AWS, so let me fast forward to the end of the process. Now, if I list all the managed resources, I can see a bunch of resources over there, a new security group was created, Internet Gateway, VPC, subnets, objects, no, objects are from different claims, DB subnet group and what matters the most RDS instance and it is ready. So, my application should now get the secret it needs and connect to the database. So, let me list the resources in the production namespace one more time and there we go. The pod is up and running, it got the secret it needed, it connected to the database and both of them were created through compositions and we can live happily ever after. Thank you so much for watching, I hope that this was useful, if you have any questions and if you have any time remaining, we would be really happy to answer any inquiries you might have or even better, come to Upbound Boot and we can chat more over there, ask anything and I will answer. Thank you so much.