 Hello, folks. Thank you for attending the last session of the day. You guys are so brave. Thank you. I am Dominic, and this is my colleague, Sean. We are working for a neighbor, especially working in the cloud-native area. I am also a maintainer of Apache OpenWISC, which is a cloud-native open-source service platform. So how many of you are aware of a neighbor? How many of you? Yeah, OK. Thank you. Honestly, I expected no one. So anyway, the neighbor is one of the biggest companies in South Korea, even though no one knows. We have been running many cloud-native platforms for several years. So today, we will talk about how to build the multi-cluster of the R7 road balancer, which is resilient to the ID's failure. OK. So from now on, Sean will take over the speech. Please welcome him, and I will be back soon. Hello. I'm Sean from Neighbor in South Korea. I'm glad to be here, and it's an honor for me to speak. I will start the presentation by explaining why we chose this subject. And I will talk about the challenges that we encountered when we were developing the SM road balancer for multi-cluster and operating the road balancer. So before I start, let me introduce briefly about the neighbor. Neighbor signifies the individuals who navigate the internet world. And neighbor is the 10th largest company in South Korea, as with $23 billion market valuation. And neighbor is providing various services and searching services, one of the core services of the neighbor, having the largest market share in South Korea. And also, neighbor is providing commerce service with the shopping life and recently acquired a push mark in order to provide the global service. And also, involved in the PIN tax industry and content industry with web-toon and snow, and also involved in the cloud industry. So from now, I will explain the reason for concentrating the L7 road balancer for multi-cluster. As shown in this diagram, significant outreach could be occurred by the various causes. But infrastructure-related causes, such as power and network and cooling problems, are the most common reasons. In other words, IDC-related failures are the causes of most outages. So we found that one of famous incidents about the IDC failure, that was the Delta Airline Data Center Outage in 2016. It caused outages due to the electrical equipment failure. And about 2,000 aircraft were unable to depart for three days, resulting in about $150 million loss. There is another incident that Global Switch Data Center Outage caused to the UPSK report. And this resulting in the 85% of user services were restored after six hours of recovery operation. According to the Uptime Institute, these significant outages are happening more frequently, and their duration is getting longer and causing increasing economic losses. And during 2019 and 2020, the percentage of outages costing more than $100,000 climbed from 39% to 16%. And last year, there was also IDC failure in South Korea, which affected the neighbor services. But the damage to the neighbor services was relatively small because the applications were deployed across the multiple IDCs. As you know, to mitigate the damages from the IDC failures, the application must be deployed to the multiple IDCs. Furthermore, the load balancer in front of the applications must be distributed traffic to multiple IDCs. Neighbor is currently operating 58 cornered clusters in eight regions. This allows for the distribution of applications across multiple IDCs. To provide seamless services for distributed applications, we are operating L7 load balancers across 14 Kubernetes clusters in five regions. From here, I'd like to share the challenges we encountered when developing and operating the L7 load balancers, as well as how we overcame those challenges. I suggest you to consider that you are developing your own load balancers that is resilient to the IDC failures. The first challenge is the IDC failover. Our objective was to develop a load balancer that distributed traffic to multiply IDCs. And the load balancer must be deployed across the multiple IDCs. And the multiply piece must be mapped to a single domain to do this. Additionally, when an IDC experience a failure, routing to that IDC must be excluded. So there is a need to develop dynamic DNS to handle IDC failover challenges. The second challenge is about layer 7 pictures. Our load balancer must support various L7 functions. It must be intelligent enough to understand the layer 7 protocols, enabling L7-based routing and circuit breaking, lane limiting, and shifting, and other functions. Third challenge is programmability. We also wanted to make sure clients could update the configuration of load balancers themselves rather than having to call to administrator whenever updating is required. In other words, all functions of load balancers should be real-time adjustable via APIs. The challenge here is that user invoked APIs is working synchronously, but the Kubernetes is asynchronous. So therefore, there is a need to bridge the gap between these two. Fourth challenge is globally available. If a user's application is deployed internationally, routing based on the locality of the application should be considered. In other words, depending on the application's requirements, it might be required to route traffic only to the same region. And also, if some region or IDC experience a failure, favorable to another region must be available to. Finally, as our load balancer system evolves, changes to its functionality or structure may impact user data. Therefore, all the data must be easily migrated even without user intervention. In summary, our task was to develop the cloud native load balancer, and we call it LST load balancer. This is the same word with Amazon. Let's have a look at how we handle these five challenges. And first of all, let me describe the simple ERB architecture that Naver is now operating. We will explain more, but we are using the East Tier Envoy to build ERB. TNS server and First Tier Envoy manage user requests to allow to enter the ERB cluster. And the First Tier Envoy is used by many different services. And this structure allows for the efficient use of public IP addresses. So if you have enough public IP addresses, you don't need to apply this divided First Second Tier Envoy architecture. And there is a Second Tier Envoy which handles the user-specific routing. The Second Tier Envoy is created for each user service and is used to deliver L7 functionality, such as L7 routing, and HTTPS redirection, and hash check, and so on. For the Lane Limit functionality, there are also Lane Limit components and Redis cluster. Working with the Second Tier Envoy, it may be used to apply the limit configuration to each user service. And there is also a hash check component that checks the status of every ERB cluster. And the hash check service monitors the status of First Tier Envoy and automatically excludes the DNS records if an unexpected response is received. Finally, there are also API servers and SED to handle the user requests. From now, let me explain how we deal with the IDC failover challenge. As seen in this diagram, a DNS query for the domain pool.never.com will be respond with the four ERB A records ranging from 10.10.10.10 to 10.13. Usually, it can't maybe allow it to bore ERB clusters. However, if an IDC won't experience a failure, we need to detect the IDC failure and invoke the dynamic DNS action to exclude 10.10.10.10 and 10.11 DLS records. And for reference, only pool.never.com domain is represented, but all domains that are registered in ERB will be affected. To detect the IDC failure and invoke the dynamic DNS action, we are operating a hash check component. As I mentioned earlier, hash check service have a role to check status of every ERB cluster skateways. And hash check service operates in a full mesh configuration across all ERB clusters. And this key format is used to distinguish the state of ERB cluster skateway. So here is some examples. The following example shows the result of status of cluster 1 skateway 0 performed by cluster 2, 3, and 4. And with these examples, cluster 1 skateway 0 will be defined healthy. And if there are some connectivity issue or IDC failure, then the result will be written with failed. And then hash check service will define cluster 1 skateway 0 as unhealthy and dynamically exclude DNS records. Our DNS servers are configured with hidden primary and secondary architecture. With this architecture, we can protect hidden primary DNS from unpredictable threats. Without integration with the secondary DNS server, changes from the primary DNS cannot be dynamically reacted to the secondary DNS server. For integration, we made the hidden primary DNS to send notify message to secondary DNS to trigger the ASF file protocol to obtain all DNS records from the primary DNS. It's important to note that sending a DNS notify message whenever a user updates the domain could cause a significant load on the secondary DNS. To avoid this, we are operating separated notifier components to trigger primary DNS every 10 seconds. As a result, hidden primary DNS will send notify message only every 10 seconds. Through this procedure, secondary DNS will be respond with dynamically updated DNS records. Finally, we can handle the IDC failover challenges by dynamic DNS and hash check component. From now on, Dominic will take over the speech. Dominic. OK, thank you, Sean. Hi, I'm here again. Now let me explain how we provide the R7 features. Precisely, this is not a challenge, but we wanted to show you what the load balance that we are building looks like. So first, let me introduce the Istio Envoy. So Istio Envoy is an open source service mesh product. It has various R7 features. It is mainly used for service mesh, but we can also use it as a dynamic R7 proxy. I'm not going to talk about Istio Envoy deeply in this session, but if you are trying to build R7 load balancer, then you should definitely consider this. And this is how we design the system in an abstract level. ELB has many different models to effectively design R7 features, such as service, routing, domain, and backend, et cetera. So the first one is service model. The service model is a logical unit for a unit that includes other models. So when a service is created, then the underlying Kubernetes objects, such as namespace, service, ingress, gateway, et cetera, are also created. And this is the structure of the service model. So basically, it has some metadata, such as created that, and updated that, and revision, and so on. And these metadata define the basic structure of ELB models. And especially, the service model has the regions and config fields. So the regions define which region the service should be deployed to. And the config allow us to configure some auto-scaling or connection-related settings. And this is how our UI looks like. So when a service is created, the default domain is also created. And when service is created, then user can access to the other UI for routing, or back-end domain, and so on. And the next one is the domain model. So the domain model defines a domain to be used in the load balancer. So a corresponding DNS record is also created when a domain is created. And Istio gateway and virtual service objects are internally created. So if you are familiar with Istio Unvoy, then maybe you can easily understand what's going on under the hood with those objects. And this is the structure of ELB domain. It has some fields like host and certificate, and some HTTP options, like HTTPS without it. And this is what the domain creation UI looks like. It just corresponds to the models. So it has the same fields and options. And there are many other models, but I'm not going to talk about all of them. I don't want to spend too much time to explain just the similar content. So I wrap them as appendix. So if you want to look into them, then you can refer to the appendix. The next one is maintenance mode. For whatever reason, you may want to put a certain cluster into maintenance mode. You may want to upgrade your Kubernetes version, or you may want to restart the container network modules, and so on. So in such cases, we should rule out that entire cluster. So whenever we put a certain cluster into maintenance mode, the IP addresses associated with this cluster should be excluded from all DNS. And this is simple to achieve because we have our own dynamic DNS. Now I'm going to talk about the programmability. When a user communicates with our system, the API core is synchronous, while the underlying Kubernetes is fully asynchronous. So because of this different way of processing requests, there are a gap between the two. And this is where the declarative APIs come in handy. I believe you are already familiar with the declarative API, if you are also familiar with Kubernetes or GitHub, and so on. The declarative APIs imply that we need to define a required state. And the appropriate module will monitor this required state and apply changes until the system reaches the required state. To compare the required state, we used revision. So actually not only the revision, but the actual object is considered. But anyway, we need to compare them with the required state. And even while we are applying a change, the required state can be changed again and again. And even the required state can be a deleted state while we are still creating an object. As a result, our system should constantly retry operations and check if we reached the required state. And this part is handled by the reconciliation loop. Actually, it is quite similar to the Kubernetes operator. The loop should be run repeatedly, so it should be idempotent. We also used a producer-consumers method to define a state and handle the change. With this approach, we can respond to the client synchronously while our consumers are applying changes asynchronously. And each model has its own producers and consumers. And this diagram shows how producers and consumers interact in a multi-clustered structure. Since our SED is clustered, so all events can be delivered to the consumers in each cluster. When an API server receives an API call, then it will produce, it will define a state. And this event will be handled consumed by all consumers in each cluster. And they will also start a loop to manage the underlying Kubernetes object. For each model, we have predefined the sets of operations. In this example, when a service is created, we need to create those Kubernetes objects. So we define an operation for each Kubernetes object. And our loop will execute those operations and check it, the required state. And if we didn't reach the required state, then it will run again and again. And we made the loop generic enough. So it could be shared among other models. And as you can see, the parameter in its resources function. So this returns a list of operations and target revision and termination operations. The loop operation has a string field to indicate the object type of the operation. And it has also a function that contains the primary logic of each operation. And again, these are also generic enough to be used for any module. And the termination operation is a function that returns the canned terminate, which indicates we can terminate the loop or not. This is an example of service operations. So it has the object type for each operation and the actual functions to execute. And now the main logic of the loop looks like this. So as I said, first, it will initialize resources by calling the initResources method. And it will run the given operation one by one. And finally, it will run the termination operation to see if we can terminate the loop. And this is an example of termination operation. So whenever a loop runs, we store results to LCD. This is because our processes are completely asynchronous. So we need to store the final and the intermediate results so that our UI can track the state changes. And finally, we can compare the revisions to see if we can complete the loop. The next challenge is our LB should be globally available. The load balancer should be globally available. It means that we must dynamically provision resources to the required region only. So in this case, if the services region is Korea, then the necessary resources should be provisioned in Korea only. We don't want to provision resources that will be unused. And when the services region is changed into Korea, US East, and Europe, then we must also provision resources to new regions. Similarly, we may no longer want to use a certain region. Then we need to clean up all the resources in that excluded region. So when we run a loop, we must first determine whether the region of the cluster is included or not. So we compare the region of the cluster with the services region. If it is not included, then we try to fetch a namespace. And if there is a namespace, then possibly there is also some resources to clean up. So in this way, even if the consumers in each cluster receive the same event, we can dynamically provision or clean up resources. And another important thing is when a service is running across multiple regions, then it is usually necessary to serve traffic from a certain region in the same region to reduce the latency. So if traffic coming from Korea should be handled by an ERB, ERB running in Korea, and traffic coming from the US East should be handled by the ERB running in the US East. For this, we rely on the GSLB product. I believe you can use any GSLB vendor, just like Akamai, because our system design is vendor-neutral. And to have more control over DNS, we configure the GSLB with region-specific CNAMES. So even if two different clients from two different regions reserve the same domain, they receive different responses. And these CNAMES are finally reserved by our DNS server so we can dynamically change the underlying IP addresses for a certain domain. Anyway, in this way, we could optimize the latency based on the location of the client. Not only for the location of the client, it is also important to route traffic to the nearest backend. And this is achieved by the locality of backends. So as you can see, we can configure the region and zones for each endpoint. With this location information, ERB can route traffic to the closest backend. And this is achieved by endpoint priority. We internally have priority for each endpoint. So the endpoint with the highest priority take precedence. Actually, the meaning of the closest can be different in each region. So the different priority for each region is specified against the same data. So here, Europe has the highest priority. And we may also configure locality favorable to redraw traffic to another region in case of any region is failed. This is also possible with the endpoint priority. So in this example, if the US east region is not available, then the traffic will be routed to the Europe region. This is because the Europe endpoint has the second highest priority. So we can prioritize the Euro region above the Korea region. The last challenge is data migration. There are three cases that are relevant to data migration. The first is the model change. So in this example, the new field region is introduced to the domain model. The second one is change in operations. So even if the model remains unchanged, the corresponding operations can be changed. The third one is model dependency. It is not strictly related to data migration, but it is critical to manage data in a consistent manner. So I also brought this case. So in this example, routing has the host for one domain. When a new domain is created, then the new host should be applied to the routing objects. So it implies that the two models are interdependent. And because there are so many models, so the dependency network may become convoluted. So it is critical to manage them correctly in the first place. Okay, in the following slides, I will explain how to handle them. First, the model change is straightforward. We simply update all data. To update all data, we have implemented migration APIs. So we defined a specific API path for migration APIs. And it includes a date and name for just for migration management purpose. So for example, we can have these kinds of APIs. So we can get all the backends data that need to be updated and use post-API to actually update all data. So in this manner, we have defined a pair of APIs whenever a model changes. Surely, it is absolutely crucial to make system backwards compatible, even if there is no data for new fields. The second one is for operation changes. To handle this, we created one API called sync, sync command. And it will manually start loop. So the loop can be started without updating any data. And it will perform new operations based on the updated code. For this reason, our loop should consider not only the revision, but also an actual object change. The final one is model dependency. Here, the easiest approach could be just updating the routing data whenever the domain is created. But it can easily cause the conflict with the visual request. So instead of updating data, we defined another interface to trigger an event. So it will just simply trigger that event. And the loop will fetch the dependent data, which is the domain data in this case. And after that, the consumer will apply changes based on this data. OK, so in this way, we defined our own data management principle. So in a certain consumer, we do not change the model of other modules. We do not make consumers write any data. Data should be written by producers only. Whenever the data model changes, we have implemented migration APIs. And when there are modifications to operations, then we must handle them on the consumer side without altering any data. Finally, all operations must be idempotent and backward compatible so that loops can be safely repeated. OK, so let me summarize our presentations. We show you five challenges that you will face when you build the L7 load balancer. And you could handle the IDC failure with dynamic DNS and cross IDC health check. And for L7 features, we introduced what Istio Envoy is and how we designed the ELB models. And we demonstrated how to design the declarative APIs and reconciliation loop and producer and consumer method to handle the gap between the synchronous and asynchronous processing. And to make LB globally available, we had to dynamically provision resources and route traffic based on the locality of client and backend. And we also discussed the data management challenges and how to address them. OK, this is the end of speech. Thank you for listening. Do you have any questions? You talked about data migration. I assume that you mean the data of the load balancer itself that means the configuration data, right? Yes. And you said there is a SYNC API for doing that. Is that what happens if one cluster is down and comes back and the data changes and intermediately? Doesn't the loop recognize that there's mismatch in the data? Yeah, OK. Actually, we are storing data in ICD. And if a certain cluster is not available at some point, then if we restore the cluster, then it will initiate the loop for all data. Since the loop is idempotent, we can safely start the loop again and again. So whenever our API server starts, it will also start the loops for every data. Since if the operation is already applied, then we can safely just skip the loop. And if there is no data or no Kubernetes object, then it will just provision at that time. Thank you. For the loop implementation, do you have multiple workers working on that loop or just a single main worker? There are multiple API server instances. So one of the instances will start the loop. This is based on the ICD transaction. So every single instance will try to get the transaction first. And if it succeeds to achieve that transaction, then it will start the loop. I see. So you rely on ICD to lock that. So only one thing can work at the same time. Only one instance work at the same time. OK. Thanks. No questions? Thank you for attending this last session. Thank you, guys. Thank you.