 Hello, my name is Ndika Nandarias. I've been working in IT since 2008 and more precisely since 2016 in the innovation and technology surveillance department of a bus government informatics society. So how to survive high demand services? Well, it sounds a little bit dramatic. It's like a dramatic title. But it's nothing that you haven't heard before. We may do it differently, of course, but I'm sure that we are all looking for something to achieve this goal. And this presentation is to show you how do we approach this problem in a high demand services called ITULE. Well, a little bit of context. ITULE is a set of online tools to promote the use of Basque language. Basque government knew that if a minority language like Basque didn't have its spot in the digital world, it would be a problem or a risk for the language itself. So ITULE was born to solve part of the problem. It can translate from Basque to English, French, Spanish and vice versa and also can perform text-to-speech and speech-to-text operations. All these operations are based in AI models, trained with public and Basque government data. Most of them run in GPUs looking for better performance and to offer a more resilient services, they are deployed in Kubernetes cluster. But how we manage to survive to this kind of services is what I am going to talk about, not about ITULE itself. And to go through the presentation, I will use quotes from the TV series Game of Thrones because in that TV series, in that show, they have to survive too and they have to face a lot of threats. So having said that, let's start with a quote from Gideon Lannister. Well, that's what I do. I drink at 9-0 things. Well, the most important part of this quote is not the I drink is the I know things part. And like we all know, knowledge is power and for us the power comes from what we know about our services. And the first things that we all want to know about our services are numbers. Numbers of requests and IPs. Well, nearly 400,000 requests condensed in 12 hours are answered by these services. Those requests are made from almost 30,000 distinct IPs. And for us, those are a lot of requests. And if we're taking account that each request prints at least one logline, there is a lot of information to be processed. In order to obtain that information, we have to collect that loglines. And which is the process that we follow to collect those loglines? Well, like I said, these services are deployed in Kubernetes. So we use the SideCard pattern to solve the logcollective problem, deploying our applications with a logcollector agent running in the same pod. In our case, a logcollector in this flu and D. But why do we use the SideCard pattern? Well, because we want to deploy the logcollector agent like a piece of our services but detached from them. And because it will run in an asynchronous mode. Every minute, flu and D will read the application logs and send them to Datadoc. So logs are in Datadoc. We can build dashboards and see statistics of the service. And this information is very helpful for us because we can understand how people use our services. And in some point, we can decide if we need more resources to improve them. But why do we use flu and D and Datadoc? We use flu and D because it's open source, of course. It's simple and easy to use and configure. And it has a very small footprint. And this last feature is the most important for us because like I said, we are running this SideCard pattern. So we are going to have this container replicated in all our pods. So we don't want a container that needs a lot of resources. And why Datadoc? Well, because it's a cloud service. It gives us the possibility to build real-time dashboards that we can share with people with no technical background. We can perform tests and we can fire alarms from Datadoc. Okay, let's take a closer look to the flu and D image. We don't use a standard flu and D image. We have created our own one. And you can be thinking, why do that? There are plenty of flu and D images out there, for example, in Docker Hub. Well, we wanted to make it easier for developers to read application logs and send them to Datadoc. When developers print log lines, normally they choose the format of those log lines. So we think that they are the best to program a script to read their own logs. And to make developers feel more comfortable with programming the script, we installed Node in the flu and D image so they can use JavaScript to program this script because we think that JavaScript is a more familiar language for them, than, for example, a cell script. When they have that program, they can provide us, and we will deploy that program in our flu and D image automatically to send logs to Datadoc. Okay, but let's take a look to a flu and D configuration file. I will only talk about three lines. The first highlighted line is where we can set the path of the log file to be read by flu and D. When we run our containers, we do it with a T command to duplicate the standard output of the container to a log file. So we have to tell to flu and D where that log file is located. In the second highlighted line, we can set the log rotate command. If we are going to have log files in our container, we have to take care of how many log files and the size of them. And in the last highlighted line is where we set the program to be run by flu and D to read logs and send them to Datadoc. This is a script made by developers. As you can see, we have some default configuration. And in the other hand, we have these environment variables that we can define them in the Kubernetes deployment demo file. But also, informational logs are the keys. We have to bear in mind a quote from Lord Berry's. My little birds are everywhere. They whisper to me the strangest stories. Well, pay special attention to that they whisper to me part. We need the mechanisms to be aware of errors or strange behaviors in our services. And we need something that tells us what is going on in our services that whisper to us those errors. But who are our little birds? Well, in our case, there are no birds, but a dog. Again, Datadoc. But this time, we add another actor to the equation, Microsoft Teams. Well, we live in a collaborative world, so I'm sure that you are all using a collaborative tool or communication tool like Microsoft Teams, Slack, whatever. But our daily working tool is Microsoft Teams. So we created a special channel with a webhook, and when an alarm is fired from Datadoc, it sends us a notification to tell us what is happening. And which are the alarms that are fired from Datadoc? Well, errors, timeout errors, service errors, any log line with the word error in it. Too many requests from same IP warning because we want to know if somebody is making tons of requests to our services, just in case. No activity warning because we take it for granted that if there are no requests in our services, maybe something is wrong with them. And synthetic tests. Okay, here I will explain a little bit the synthetic tests. We run two different tests. We test our API endpoints with HTTP calls, and if we don't get an HTTP response code 200, we file an alarm. And the other test is that we test our website making a navigation test. So let's see some examples of notifications in Microsoft Teams. This has had too many requests from same IP alarm. As you can see, basic information is sending notification, like, for example, the time stamp from which IP where the request made. And specific information about the service. In this case is the translation service. So we are getting how many words were sent in its request. Another example, timeout error calling the API endpoint. Well, in this case, it's an error of timeout. So here we can see what is the timeout that we set, 15 seconds. And the alarm was fired because it took 60 seconds. Okay, with this configuration, we are informed almost in real time of any problem. We can react to those problems very quickly. And we can analyze those errors from the service point of view. But now we are aware of errors. We have information or power. But like Ceci Lannister said, what good is power if you cannot protect the ones you love? Well, and again, Game of Thrones hits the nail on the head. Ceci is right. No matter how much information we have, we need to protect our services. So considering that our services are simple services, our main concern is to minimize the denial of service attacks. Well, we can let that jump to the firewall or to other network control tool managed by other departments, but we love to be able to control the whole service. So at the same time that we were facing this problem, another part of the team was doing a proof of conflict with Istio Service Mass. We were not looking for this feature in Istio Service Mass, but starting the tool, we saw that it met the needs we had. So the solution was in front of us that it was to use Istio Service Mass with a local rate limit configuration. Well, let's take an overview of Istio Service Mass. From top to bottom, we have the Istio English Gateway that is deployed in the Istio system named Space. It's like the core of the Service Mass because it manages all the requests made to our applications. And to configure that gateway, we have to deploy in the application name space, in our application name space, a gateway, a virtual service, and a destination role. Well, we control this part, the application name space. So the most important resource for us is the virtual service because it defines the routes to our application service object and the criteria that the requests have to make to reach those services. Well, destination role is important too because with this resource, you can tell to Istio what happened with that routed traffic, for example, Loa's Balancing Mode or TLS Security Mode. Well, but now we have the architecture, the basic architecture of Istio, and now we have to add the rate limit configuration. So we have deployed the common parts in two different name spaces. The service message itself is deployed and the Istio common name space where the rate limit resources are deployed. I'm not going to get into details in the Istio system name space because more or less we have already seen that in the previous slide, but just into the other one. Well, to have the rate limit configured as we wanted, we deployed three resources. The rate limit software and two Redis databases. One per second request and another one per minute request. We split Redis database in two for better performance because we made some tests with one database and we saw that it was needed for a certain number of requests. Well, in the other part of the slide, we had the application name space. Like I said before, every application under the umbrella of Istio had to deploy its own gateway, its own Bitwa service, and its own destination. With these three resources deployed, applications' owners can manage the traffic to their applications. And if they want to use the rate limit control configuration, they have to create a config map object, a specific config map object. We are going to see an example that a config map deployed for each of these services. Where you want to use our service, we will give you an API token. This token is used in the rate limit configuration to limit the number of requests that can be made per second and per minute. But we don't use only the API token. We use a combination of four keys. The name of the service, that is called generic key, the name of the service, the client IP address, the API token, and the request path. If any request made from the combination of these four keys exceeds the number configured here in the rate limit configuration, it will get a HTTP error code, 429, too many requests from this combination of keys. Well, to see that... I don't know if it doesn't see anything this time. Well, it's a game into the script to make HTTP calls. It's very simple. I don't know if you can see here our five threads in five seconds to make one request per second. So if we run this script... Okay, we find another finger cross demo. If we run this script, we can see that all five requests went well. Okay, let's create that. And we are going to change the ramp up period to three. We are going to make five requests in three seconds. So if we run this script, now we are getting errors. And if we go to see what is the response code, it's a 439 response code. So there is no mic here. It's that simple. Okay, so now we are protecting the service from others. But we have to protect it from ourselves, too. So the freedom to make my own mistakes was all I ever wanted. Man's writer said. Well, we all make mistakes. So how can we earn the freedom to make those mistakes? How can we protect our service from ourselves? Well, what we are looking for here is to minimize the possibility of deploying something that might function of half errors in production. And to do that, we use something called saddle or mirror deployments. And we can configure this in Istio using the virtual service resource. So I'm going to explain this saddle deployment with the graphic. Well, we have for version 1, deployed in production, running. The green arrows are real traffic. And when we run our saddle deployment, we are going to have four pods, in this case. One with the version 1, another two with our release candidate version or with our version 2. But in this case, arrows are in red. That is not real traffic. It's replicated traffic. So users are getting their answers from the version 1, but Istio is replicating the traffic to test our new version with real traffic. So we are testing replication in a more realistic way. We can see if there is some errors, if the performance is better, is worse, and we can decide if that version can go to production. Well, but even so, we are free to make our own mistakes. Errors, of course. And we are going back to Game of Thrones looking for another piece of wisdom, now provided by Braun talking with Tyrion Lannister. If you are lucky, no one will notice you. Well, I want to focus on that no one will notice you. When we deploy something in production, if it has errors or if error occurs, the best thing that can happen to us is that nobody will notice that error or at least a few people notice it. And how can we achieve that? Well, in Kubernetes, we can use the rolling update feature, but with that feature, we pass from the version 1 to the version 2. There are no versions in parallel running. So we are looking for another thing. We are looking at a way to be quieter, to be stealth. Again, we are using Istio features, modifying the virtual service resource. So we start with our version 1 deployed in production, and if we run our canary deployment, we are going to have again four pods running in production, but now all arrows are green. The both versions are answering our customers or users. But with canary deployment, we can set how many requests are going to be answered by each version, for example, 80% from version 1 and 20% for version 2. So with this type of deployment, we can control if there is any error, that few people notice the problem, and if there is a problem, we can go very quickly. Well, we can go back very quickly. We have to change this value to 1%, and the job is done. But to achieve this kind of functionalities, shadow and canary deployments, we need a lot of different jammel files with a lot of common objects in those files. So to avoid to duplicate the files, we use Customize. And here you have more or less the directory structure to run Customize. We have a base configuration where the common objects are defined, and then we have a directory for each type or kind of deployment, one for the stable deployment, another one for the mirror deployment, and another one for the canary deployment, because between those deployments, the only objects that change are the destination rule and the virtual service objects. So I have a video that I think that it doesn't see very well. Well, the first step to make that possible is to deploy the two versions. In this video, we have the version one of the application, that is this one, and we are going to deploy the release candidate version. We use Jenkins to automatically deploy our versions, so maybe it's not... Well, we can launch the release candidate version from Jenkins. We have to change the version, of course, and now we have the stable version, so we are going to change it to release candidate with this version. Okay. What is happening? It's going very... So, now we have seen the logs of the stable version. To make some tests, we are sending from Portman some requests, and we can see in the logs that... Well, APA demo temporal, APA demo temporal is used to make the request, and they are used around this postman runtime. It doesn't work. Okay. Well, in the meantime, we can see in the Kali dashboard how is going our request, because we have tested from Portman, but we have configured to Jmeter script to the request without end. So, here we have 1.36 requests per second, going towards stable version, and if we go forward, we have our release candidate here. It doesn't seem very well. This is our release candidate. Here in the tags, you can see version release candidate. So, now, the second... The step of our deployment is to change the virtual service configuration to be able to distribute the traffic, or replicate the traffic between our versions. So, we have another pipeline in Jenkins to do that, where you can choose the application to be modified, the kind of deployment that you have to do, in this case, mirror deployment, and the percentage of the request that you want to replicate. In this case, we are going to replicate the 100% of the request. So, we launch the pipeline. Well, suddenly, we are going to see how this graphic change. We can see... Okay. Now, all requests are going to their stable version and to the release candidate version. The traffic has been replicated. We can see 2.24 requests per second. And Istio will balance the percentage of the request to 50-50. Because we are duplicating the traffic, we want to replicate the 100%, and now we have the 50-50 request going to each deployment. Well, and if we go back to see the logs of our release candidate version, now we have logs that are replicated traffic. And with this, we are going to launch the Canary deployment. Well, when we launch the Canary deployment, we have to choose how many requests are going to be answered by our stable version and how many requests are going to be answered by our release candidate version. In this example, we are going to choose 70% and 30% to the release candidate. Okay. And if we go forward a little bit more... Well, now Istio is not replicating traffic, so we have 1.70 requests per second. And if... Well, they don't see it very well. And here, we can see... No, we can't see it, but... We will see how many requests are being answered by each deployment. So, we are going to go back to the presentation. Well, to Samal. We have collected data. We have information, so we have power. We are able to be aware of errors. We have a surveillance system. We can protect our services from others, so we build some structures to do that. We protect the service from ourselves. We can test our applications or weapons with a problem. And we can deploy applications in production in silent mode, so we can be a stealth. So, now we have all that is needed to survive in the game of thrones. But, like Sosa said, as Lani said, when you play the game of thrones, you win or you die. There is no middle ground. So today, we are winning. Let's keep it that way. We have to improve this architecture, this configuration. But, because in the future, who knows? So, that's all. Here's my contact info. Thank you for listening. If you have any questions, I don't know if we have time.