 session. So I present the V-Sense to do this kind of sharing with you guys. So let me introduce myself. So my name is Hou Dejun. So I'm an engineer manager of the V-Sense. So I draw in V-Sense almost around more than six years already. So today and my topic I want to talk about a set really engineer practice in V-Sense. So before I started, so can I understand how many people you guys to know what set reliability engineer means? If we compare with DevOps, yes. Yes, so Jerry speaking, yes, he's like that. So the hope, you know, the goal is to make sure so whenever you develop service, when you provide any service to some customer, you make sure your the system running reliable for the users. Okay, so before I start, I want to quickly introduce my company. So yeah, our company V-Sense. So we are global leader in V-Sure AI solution for Retailer. So this is our focus on Retailer. So we have this is all the, you know, customer. You can see we have Rakuten, Ulicor, and Mintra, Jindong, Alidas, Asos, and Mongo. This kind of company. Some kind of company is come from Retailer. Some companies maybe come from e-commerce sites as well. Yeah. So currently the invasions we run in two kind business line. One is enterprise business line. So we're sending all the technology to our some enterprise customer. So like this company, some of it's all the enterprise customer. And the other kind of business line is we call it as B2B2C. So that means we will, you know, first provide some service to end users to provide some kind of experience. Later we have a quicker introduce. So we, V-Sense, we are being mentioned by Retailer for the top 10 technical employee in this. And we will also be mentioned by our Prime Minister of Nihonong. So highlight of V-Sense along with Reazer and SEA as the, you know, example success company in Singapore region. So this is our typical user case. I just mentioned the run just now. So the first one is, this is, I just mentioned the first one is B2B. So we face to the customer. We provide this kind of technology. So you can see here. So this is a user case. So when the user, when the enterprise customer integrating our technology, they can put this kind of technology into their website. And to enable this kind of visual search experience for this is a user case. They can take the picture, what they have provided the user, then they can trigger a search to find the product. Okay. So this is your case. The second one I mentioned is B2B2C. So this is, so we are cooperating with some kind of OEM card partner like Samsung and Vivo and ALG, this kind of OEM partner. They integrate our technology into their, you know, native camera. So that means, so when you buy a new Huawei, ALG and Vivo form, so they will have our technology inside. You can open the camera and you can see this kind of function. So this is a user case like the Samsung. So you can take a, you can use your phone, put to any kind of product, you know, maybe product, anything you want to search. And the system will automatically detect the object and trigger visual search. So that's back end, powered by us. Yeah, this is just a lot of the user case I mentioned though. So this is the Samsung, how they integrate with us. So this is the user experience they have. This is Huawei. Huawei is integrating us with us the same, but the user experience is a little bit different, but the overall this scenario is the same. This is ALG, how they integrate with us as well. Yes. So we are growing. So currently we, as a business, we are running our service globally because our customer is globally. So we have some customers in Europe, some maybe in U.S. and in China as well. So that's better for the way to make sure our service can be, you know, running globally to better serve our customer. So currently we have six, you know, cloud center deployed in, in the global region. So we have a Japan, U.S., Europe, and Singapore, China as well. Yeah. And especially for the last, you know, years. So you can see all that traffic, traffic growing. So almost three times of the track of traffic or compare is the same time in the last years. That's how we're growing so faster. So also this is also bringing some challenges for us. So definitely first one is global service management because you need to manage your service globally, right? Do service, service, global users. The second one is the record, you know, quick, you know, traffic growing. So maybe then I can say in the following years we have more and more users and the traffic will be coming and growing. So that's why the traffic is growing very fast. The third one is more complex because our system will be more complex when your system be scale or larger. And so when this one happens, how to make sure maintenance are good, you know, set service level agreement with the customer, with the users, this is why it will be challenging as well. And last challenge is when you happen this, when this, when you have some service reliability, things happen and we so we call an incident, right? If what incidents happen, how we can quick response for this instance so that we can save the user experience to the, save this user experience, yes. So that's the challenge we are facing. So that's why we are trying to bring this kind of set reliability engineer in basin sculpture and to improve our the overall service availability. Yeah. So why reliability is so important though? I think this one is very straightforward, right? You guys, so when, so think about it, if I am one of my protection partner and a customer, they see our website, this is just example. So they look for our website, right? But if suddenly, so they say, okay, the website, why is, oh, I cannot say anything, right? So then what, what, how do you think that the user will think about our system, right? And then maybe later, so, oh, okay, our system is broken, right? So this kind of experience is very, very bad when it's speaking, right? So if this is a very important partner, they say, ah, okay, they are single, ah, this company, the company still running this service, that the company still can provide a reliable service to us, right? So this, this is a very typical user case. So any, any kind of company, they have this kind of problem, right? So back to our case, so reasons, we are provided a kind of visual search to all the customer. So when they end the user, if the end the user, if I want to use my phone to trigger search to find a product, right? If I, the search itself to trigger, I need to wait a very, very long time to get the product to be returned, then I, maybe I lose interest in to continue using it. At the same time, because we are, we are bringing this experience to the users, we also want to bring the user this kind of shopping experience, right? So from business perspective, we are losing the, lose the revenue, right? Lose the money, right? So, so that's business perspective. So this one, so Singapore this, so reliability always is very important to us as a company. No, it's not just a user experience problem, also it's a business problem as well. So actually, what's the set reliability engineer? So this, the top, the concept is, you know, be, rise out by Google at the first time. So set, in Google's definition, the set reliability engineer is, is a discipline that incorporation aspect of the software engineer that applies them to the infrastructure and the operation problem. The main goal is to create a scalable and highly reliability software system. So this is the definition of the Google. When the Google first come out of this concept, they come out of this idea. So in the following few slides, I will share how we put a practice, this kind of set reliability engineer in us. So I will share how, how we run this kind of operation from the traditional way to all the current way, and how we define our reliability and how we run our monitor system. Because if you will talk about reliability engineer, monitor is a very critical part. Yeah. And then I will also share about how we improve our reliability. We talk about reliability, right? So how can we improve it, right? So in, in this improved planning, so we are talking about, you know, traditional, you know, DevOps pipeline, and as well as a new, like our infrastructure and the change manager strategy. So in the last few, I also want to share a kind of post-mortem culture. So this is very important. If you want to get a good reliability in the culture, post-mortem is very important. Make sure you can learn it, learn it from the instance. As well as last part is the SCR metrics. So if we were running SIE, so how can we make sure our SIE is efficient, right? To compare with other, you know, straight, like compare with DevOps, compare with other companies, how can we make sure we are running better? Any questions? Okay. So this is the way how we, you know, come to it right now. So when the first time, so when we launch this, we're running the company. So we're also running very, is a very standard, you know, traditional operations, we're operating way. So we have our dedicated operation guys that do this kind of operation. So when they develop, they'll have a, you know, have a feature be released, be delivered, right? They need to pass something to the operating guys. The operating guys, you know, in charge of push this one in running all the cloud service, right? So this is the first stage. So but when they, when we run this model for a few years, so we find some problems we are, right? So later we'll share you some what kind of problem we face. Then we move to the DevOps. This is, DevOps is very common practice in the industry and for a few years already. So then we introduce DevOps to continue, improve our the practice, improve our the reliability. But we still face some problem, right? So what's the problem? Yeah, that's the question. So later I'll share with you guys. And then finally right now we move to the reliability engineer. So so that we can use their kind of software approach to manage your operation, manage your ability stuff. And let's think back to the when I talk about the operation operator. So what's the challenge of the operation? So in the operations, this model, there are two kind of people. One is the developer. One is the operator, right? So the developer is always thinking about it. How can I, you know, quickly finish my code and finish my, my features and push the feature to online? This is a very, you know, city forward, you know, mindset. And for operator, the main challenge is, okay, I need to make sure responsibility, make sure my online system is stable, right? And it's reliability, right? So this is the the position, how we, how we are sign this kind of position. So the problem is the operator, they don't understand the code. They don't understand how you running the code, how you write the code base, right? How your system running underline. So this is, they don't, they have a little understanding regarding this one. For developers, they also, the problem is the developer, normally developer only, you know, focus on the feature development, coding, this kind of stuff. They don't, they have a little understanding that operation practice. So what's operation practice? Normally we are, we may be extensible, right? So when you face a problem, how you quickly need to make sure the service can be, you know, come back, you know, never back. This is kind of operation, you know, maybe practice. But normally for us, the developers, they are, they are like this kind of mindset because they, they don't have this kind of experience. So this misalignment always bring a lot of kind of conflicts because they develop, I want to push the feature because there is pressure from product manager, maybe from company side, they want to finish this feature as possible, as soon as possible, right? But as a operator, I need to make sure the service as stable as possible, right? You can slow down, let's slow down. Now the service, maybe I can have more time to make the service be stable, right? This is the kind of conflicts that normally we are facing when we do this kind of running this kind of operator, you know, model. So what's the problem, yeah? So this, the problem is, you know, not, not alignment between this operator and the developer. So this is how DevOps, this concept come out. The DevOps is a, is a set of a practice that automatically they process between software developer and ITT. So they want to build some kind of practice to reduce this gap so that they can alignment regarding the service reliability. So in the DevOps, there is a five, you know, key practice they may, they want to always want to implement it. The first one, reduce organization sales. What does it mean? So that means, so we don't want to very yellow or straight yellow bar into between developer team and IT team. So we, how we can make these two teams together to make a software developer more stable. So this is one of the principles. They always want to reduce this kind of organization sales. The second principle they practice, actually practice, they may, they highly accept, you know, failure as normal. Because software, normally software running in server, right? The server is hosting the data center. The server is definitely not a reliability, right? Network also is not reliable, you know, and, and as well as most service, service also need to bring like the human, human operation, right? So this one will be more complex. So the reliability will be, will be more problem, more, more challenging. So failure is a, is a common problem for any service. No, no kind of service can guarantee, okay, 100% reliable. That's not possible. Yeah, so this one we need to accept this, this problem. The third one, they practice DevOps. They want to try to, you know, enhance these implemented great gradual, you know, changes. When you do some changes to online, they try to all try to avoid some big changes as one time. So if you are big change, can we break down to some small changes one by one slowly so that I can quickly, you know, to, if there's any problem happen, I can quickly identify which one brings this problem, right? So if I, and I can quickly rolling back, if the system is broken. So that's why they come out of this, this principle is a gradual change. So make sure always do some mean, minor change first, then verify, then push the next one. So this is one, another principle. The next principle they also want to use, try to leverage some tool or automation way to reduce the human operation so that we can use a software strategy, make the system be running more smooth, so that one. So reduce definitely will help us. The last one is measure everything. So when we talk about DevOps, so you need to make sure everything is measurable, is controllable, and is monitorable. Yeah. So this is the principle when we talk about DevOps, they, they want to, you know, to highlight. So then we talk about SIE. So if we treat DevOps as a kind of program language, so what's the difference of SIE, right? So just like I mentioned, DevOps is kind of the, they come out of the kind of principle, right? And practice, you should do, but how you do, maybe different company, different, you know, organizations, they have a different idea. They have a different, you know, learning, right? So SIE, you can treat, think about SIE as a concrete, you know, implementation of DevOps. If we think about from the computer, you know, in the software perspective, this will be like this. So when you, when some companies, when some guy will ask you, what's the difference of the DevOps? What's with SIE? Maybe you, we can only speak, I can, for my part, I cannot say too much different. It's just like SIE, make the implementation way, give you a more concrete guidance how you do it. Yeah. So that's the, the, the idea. So when we talk about, you know, cellular reliability, so I heard about this one is the most, you know, the first one, we always hear about them. So when you talk about them, some customer, when you talk about it with some, some kind of user, they will ask you, first question is, can you share with me what's your service SLA, right? So a few years ago, when I heard about this one, I will have a question mark, what's your SLA means? You are talking about the latency, you are talking about the service availability, or you are talking about my, for us, why we should search. So you are talking about research performance, search accuracy. So this is a different definition. So when you talk about SLA. So yeah, so that's why SLE, even in the SLE, SRE practice, they put this one as the first one, we want to, you know, emphasize. So make sure either us as a, as a company or our customer let's have an agreement what SLA means, right? So when in Google, the definition as a set of reliability, they break it down SLA into three kind of level, the first one, and so we need to have a defined SLA, SRI. So you, when you talk about SLA, SLA, so let's define, separate the different kind of metrics you want to talk about. So you, maybe your SLA can cover latency, right? You can cover, maybe availability, availability, maybe related with like error rate, like this kind of thing. Or you maybe talk about some kind of other kind of, you know, performance standard. So let's define. So this is what SLA, I mean, service level indicator. So choose the indicator, which one you want to represent your service reliability. Then if we have SLA indicator, then we can define the objective. So for latency, so for example, we can define as us. So our service latency, I can guarantee one second for every, maybe not every, maybe 99% one second, less than one second. This is a objective. So we can define this kind of objective. Then after you have a range of this kind of objective together, put together, this one we are, then we are calling as SLA. So SLA is more like a service level agreement. So it's kind of contract that a service provider promise customer or service availability and performance. So this is the relationship between this concept. Yeah. So overall, so the SLA, I drive SLA and then SLA. So put it together, which is informed as SLA. Any questions you got in this one? A little bit of a confused somehow. Yeah, because there are so many new concepts to be defined. So let's think about the data here. We just gave you, give some example, for example, availability. So how we define availability? So in terms of SLA, I and SLA. So availability. So normally, we use this kind formula to define. So how many API? So this API, first of all, you define API, what kind scenario the API is success or is a non-success? So this one you need, you can every customer company, they have different definitions. For example, a typical definition, maybe you are kind of single 500 to this can have 500, and we treat that kind of failure. Other, maybe I treat our success, maybe like this. So we can define this one first, and then we can use a count of the overall success API and divide your overall, count of the overall API. So then you can get a ratio number. So this is the availability of your service in terms of this kind of, maybe this availability. This is one service level indicator. Then you have this indicator, then we talk about the object. So what's the object? So which one we can guarantee to all the customer? So maybe, for example, I can guarantee 97% of the API core is success. So this is the object. So you also have some other kind of indicator to represent your service or reliability as well. For example, we can talk about latency. So when we talk about any service, we have two kinds of spec, one is the availability and one is the latency. So when we talk about latency, so how do I design the SLI? So we can use a count of API duration. For example, if we, the first one you have, when we talk about this one, you need to have an agreement with your product or with your service, with your customer, which one, which kind of latency arrangement is acceptable. So this one, this is the first one we need to know. For example, in our case, let's think about it. So our API latency, make sure less than 500 milliseconds. This is the accept standard. Then we always use this one to measure. So we can define our service level indicator is how many API core, the latency is less than, you know, 500 milliseconds. And then I divide to the overall API core. So this is your latency. Service level indicator. So then come up, then we need to think about what's the objective, right? What, which one, how, how will it be, we can guarantee the user, right? You also need to define the number as well. For vacancy, for all, we always use choose the 99% as all the standard. So this is a typical user example. So if you don't, just might share some example, if you don't have a good idea how to define this one, there is kind of for golden, you know, metrics, this is reference, you can just use it. Latency definitely is the most important one. If you are a service provider, service provider, so you always think about from latency first. And then is the error, error rate is also usually match with the availability, just on I mentioned. Because when we talk about error, there are a few kind of error types. The first one is a HTTP status error. So this one is a is very easy to catch. Another kind of error is a busy. Sometimes, for example, the HTTP calling is a success, right? But your service side is given me some, you know, you know, this is a busy information or some kind of API, you know, exceed the limitation, this kind of error. We also treat an error. Because the last one, maybe some kind of other kind of error, maybe some the parameter is invalid or you give or your response are empty result. Also, you can treat as kind of error. So this is subject to how you define your business. Traffic. So this one, like just now I haven't mentioned, traffic is also important. So you also need to define the traffic, you know, SLA. Because without this one, you don't know how much traffic you have and what's the capacity your system can support. So this is one matrix also is very important as a company. So you need to define this one. Situation. Situation is the concept is so, so for example, the current your system may be running 100 in those, you know, instance in cloud. So if we get it, have this, this instead group, so how much kind of service I can support it, right? So the, so we can break it down into few kind of, you know, metrics. So like IO, so what's IO ratio, right? And the CPU utilization and memory utilization and disk utilization. So you can know what's the current load and what's my capacity. So if for example, in us, we set up very standard process, it's like for example, the memory hit 80%. I think it's a very, it's a warning information. I will trigger alerts and monitor and let our engineer try to fix it. This come back to just what I mentioned, the metrics. So in this sense, we also follow this standard to implement all the metrics definition. So here is the application layer. So it always have availability error rate and latency and throughput, this kind of metrics definition and keep into our monitoring. And in the bottom layer, so we have like a Kubernetes infrastructure. So all our services in running in Kubernetes. So in the Kubernetes, we can connect some kind of, you know, metrics as well, like a CPU utilization, memory utilization and some other events as well, this kind of thing. The last, last layer is a fundamental like a physical server layer. I also can connect some kind of, you know, metrics, like the physical, physical server, you know, CPU utilization and the memory and the disk usage. So this is how we, all this kind of metrics also can match with just I mentioned there for golden metrics definition. In this sense, all these metrics, we connect, automatically connect a user, while this kind of promises this, you know, industry standard service. So if you get these metrics pushed to this one, this service will help you to define them, you know, easily to define the monitoring, you know, strategy, like you can send them monitor to email, to snack or page duty, this underline, you also can trigger some, you know, you know, operation escalation policies as well. So this is how we invest into running monitor system. Any question? The monitor setup also is very important one. You need to make sure the monitor is open. Everyone can access it. So we also build this kind of dashboard. So make sure the monitor is open to everyone to access. So everyone can go into our system to look at how our service is running, so how many API calling field, how many, what's the latest, the current running status, so all this information in our dashboard. Another important thing, so this monitor is just information from our side. So another kind of, we also monitor is from the end user perspective. So for example, we are Singapore, so we get it, okay, I can get the information from the server log, understand our latency, right? But how do you think if the user is come from Russia, right? If the user come from China, if the user is come from US, what's the latency difference? So this also is very important because when you talk about the monitor, you also need to think about from the user perspective, the end user perspective. So we build this kind of, you know, this kind of dashboard. We also can easily understand which part is the latency the most higher one. So as you can see here, so something, the dark part is the latency is very, very high. So this one we are highlighted for our engineer. So oh, this part is the latency is not so good. So we need to think about how to improve this part. So learning, we're running morning quite a quite aggressive. So we're learning a lot, you know, from the last few years. So internal monitor parts ready. So when we first to running this monitor system, we'll choose a some running metrics like we use average. So this is very, very easy to do by some company. So and any company when they start this morning, I always use the, you know, average to represent the performance. But this average maybe you have, you know, have a miss meeting. So you will equal some problem really. So then we find these problems and we switch the monitor to the late percentage. So we always use 95% and use under more aggressive, you can use 99% of the latency to measure your system. Right. So for example, so for our service, from average perspective, maybe we can achieve 300 millisecond, but this looks quite good, right. But some real, some user still not happy because somehow, sometimes they, the service request response maybe take a few seconds. But from the, if you monitor average, okay, serves look good. But this doesn't really reflect the user's experience. So if we can use this 99%, so this one more close to the end user. So then we can use this one more, put this more straight, you know, strategy to push our engineer to fix, to think about the firm any single user. Another one, I just know I mentioned the the geographic monitor system, right, that's very important. Because when we talk about monitoring, you need to monitor from the user point. So you are not just thinking about from our own, you know, service operator point. So for example, some, some customer, maybe they come from Europe, they calling us, okay. So why your latency is so unstable, looks like this. But when our operation guys to look at our service, even my savings is quite stable, what happened? So this is very hard to, to management if you just looking at data from our perspective. So that's why we build this kind of monitor from end user. Okay, I was, can we set some kind of monitor, you know, agent in the Europe? Let us trigger from there and to understand, to check what, what really happened from this, this kind of view. So that's why how we build this. So we will continually optimize this kind of long tail user experience. So that's why you also use, you know, match with all the standard, we always use 99 percentage to for our monitor, you know, monitor standards. This is a monitor. So we build this kind of monitor system. It's very critical without good monitor, you don't know what, what will happen to us for your users. Yeah, DevOps plan. So just now we talked about the service reliability definition and how we connect the metrics to understand your reliability, right? So then we talked about how to improve the pipeline, how to improve the reliability. So in this sense, we also follow a traditional, you know, DevOps culture to build a few kind of pipeline to help work together with all the developer to, to provide tools to, to fast things are, you know, develop speed, you know, iteration speed. So we have the CI pipeline. So we can help them to quickly, you know, without any too much stuff. So they can quickly, you know, to run this kind of building stuff, like I can build the, the artifacts and running unit testing and build some docker and push docker, this kind of thing. And the second pipeline is like a delivery. So this delivery is not a deployment. So when you have a service be ready, so you need a test to verify it is correct, right? So as a basis, we are a unit one, because we have a lot of kind of augmented part. So when we have augmented, you know, release, how can you make sure the outgoing output is accuracy is not doesn't break. So there is upon a lot of kind testing, you need to run it before you really push it on it. So we have this kind, continue to deliver pipeline. Whenever a code changes some bit, or whenever our data scientists or data, data or engineer, they have this kind of new model be, be released. We also need to run in this pipeline to make sure this one doesn't break all the system. So this is how pipeline do to design like this. And the last kind of pipeline is deployment. So when, when you have a finish your testing, verification, evaluation, then we can move to the deployment. So we also build this kind of pipeline for all the developers. They don't need to touch the infrastructure. They just use this system kind easily for them to quickly launch our deployment to our production system. And without too much consent regarding break the system itself, because the pipeline itself can can help them to, to resolve all this problem. So this is pipeline. So because I just mentioned the DevOps principle, one of the principle is leverage automation and tools, right? This is, this is why how we build this kind of internal tool to, to improve all the productivity. Another, you know, strategy way I invading the way also enhances the change in managing infrastructure as code, for example. So how many people do you know the infrastructure as code means? And you guys know what's the means. So infrastructure as code, this is very, so when we do the operations, they let you change your mindset. So normally when they do the operation, if we follow traditional way, how we do it? Our guy, operation guy, SH, maybe SS, access our cloud service or log into AWS service, Google cloud service, right? And do some change configuration. Maybe click some button or open some page, this kind of operation stuff. They do it like this, right? Then maybe repeat this task every day. Then finish the, then the service will be follow your instruction. So this is a typical operation way, right? What's the problem? So, yeah, the problem is that this kind of operation is easily to break because you have this kind of human stuff be there, human operation be there. Then maybe the people don't, you know, maybe, you know, miss operation, miss click, click the wrong button, suddenly maybe delete your service as well, right? This is very dangerous. So this is, so this kind of thing always happen in other companies. So in the few previous years, always happen. Even the guys working in our company, maybe a few years, very experienced, but still they have some chance, they have this kind of miss operation. They suddenly click some button and then break my system really happen a few times. So that's why I keep thinking about, how can we save this file? How can we avoid this kind of thing happen? So that's why I'm common way we are born in this idea, infrastructure as code change in this concept. So what does it mean? Infrastructure code is define your infrastructure as use code base, use a code. Like software engineer, you when you define an API, right, you write a code, write a code. In write, some can use some code logic and flow to in the, and make it combine and combine into a kind of, kind of runable. So the same. So when you talk about operation, can you define your operation, use a code? If your code, if you define that your operation as a code base, right, then I can bring the very traditional software practice to verify your operation. I can validate your operation because you are code, right? I code, I can running testing. I can, I can bring the review. I can bring the audit to make sure this thing doesn't break my system as well. So this is the difference. But overall the benefit is you can control the operation. Your robot can directly touch your system. So when they want to touch the system, they need to change something into code. Right? I can, I, so that I can prove the code. If without prove the code cannot be pushed on line. So this is the mindset of changing. So I just mentioned, so talk about the, improve the tools and have these kind of automations. So, so when you, they're interested in how we do it, right? So this is, I really have a kind of instant management practice. So in the instance, we have this kind of instant managed practice. So always have a communication as a first priority. Your communication is very important if you have a good response, you know, instant management strategy. And the second one is operation. So if you want to have a good instant management, you know, practice, so one important few important though is very important. The first one is a commander. Always have one single commander to, to control all this kind of cooperation and coordination things. And then you have, you, the commander need to design some, you know, assign some operation data to do this kind of operation stuff. So this is also is a hallway guide. You do the operation. Another way is very important post-mortem. People will invest in the world. A lot of kind of, a lot of incidents happen every, maybe every month, every quarter. So how can we learn from this one to improve ourselves? So we need to, we build this kind of post-mortem culture. So, but the first one for post-mortem culture is very important when you need a blaminess. So otherwise the, the engineer won't want to do this post-mortem anymore. So we have a build, also build a few kind of practice and a culture to, to running this post-mortem. So how to measure the SRE? So we're running SRE quite a long time. So how to measure my, my team is work evasions. So we also define this kind of metrics to, to work, to measure. So overall, so we have three kind of critical metrics to measure our teams. So first one is the time to detection. So when an incident happen, how quickly the, the incident can be, the, the, this event can be detected by our system or by our users. This is one of our metrics. Another one is when the event is detected, how quickly my engineer can, can jump in and do some action. So this is time to engage. And last, last one is very important, how long time, time to fix? How long time you spend at your, your engineer to fix this issue? So this kind of three metrics is very important to measure how efficient you're, you're running in, in your company. So we build this dashboard. We can continue monitoring my, my team how, how efficient they are. So like how many alerts? How many, how long time the, the engineer needed to, can be, you know, engage to this alerts? How long time they can fix this alerts? Some other kind of strategy is, you know, so when you define your SLL, as Jackson mentioned, so you need a, so you need a continued improvement. You cannot adjust the AK. Currently we define 95, 95% is a, a latency less than one second. Use only the standard to measure all the people that you, the system needs to improve it. So you can continue to improve your monitor and improve the system. Another important, you know, practice we enhance in our, so we treat SLE is very critical for, you know, team. So when we have a new feature delivery or new, new feature planning, we always want to my, put my SLE engineer team with to join the tech design from very, very first stage to, to join the team activity so that the team can understand what happened for this, for this feature, right? What's the requirement for the user? How are our engineer to design this system so that they can better understand the system than they can do. So when the same happened, so they have a good, you know, way to serve the problem. So this is very important. So go join with the tech team, engage the SLE engineer in the tech design review. Another one is a playbook. So we, so when the same event happened, so we need, you have need some paybook to run it so that you can easily to serve the problem. So not only SLE engineer, so any engineer can, can serve the problem. Here is the last one is all the tech stack, overall tech stack. I think this is quite standard in the industry. So the Jenkins site, it's build site. We have Jenkins, Stalker, GitHub, and some code checking standard, and the pipeline. So we have Spinnaker, and Terraform, Ansible, and Heem, and Slack. This one to build the pipeline channel is tech stack. The infrastructure channel, so we always use AWS and Google. This is one of our, you know, cloud provider. And the Kubernetes is all the, our infrastructure standard always use the Kubernetes and some vendor to help us manage this service. The monitor standard, we have like promises, data doc, infrastructure DB, and the BigQuery, Ops Genie, and Grafala, this kind of text. Yeah, thank you. So any question? So yeah, we are currently, yeah, we also open some, you know, position to, we work on some like a favorite student, and if you want some internet and working in our company, so you also can access our website to get some open position. So you can, if you are interested, you can apply online. Yeah, thanks.