 All right. Hi, everyone. Really happy to be here in front of you today. I'm Adrian. I'm a senior ops engineer at Decathlon in the Cloud Platform Engineering Business Unit. So at the name suggests, we provide the Cloud Platform product to our internal users. So the idea is to standardize and ease the adoption of Cloud solutions for all our IT users. I'm more specifically in the Cloud Native Partner team, which handles everything related to Kubernetes, content orchestration, and serverless components to integrate them into this platform. And today, I will talk to you about Service Mesh. And first, I wanted to talk to you about the study we did about Service Mesh last year. But then I changed my mind, and I'm going to talk to you about how we can improve Service Mesh with AI at the edge. Basically, the idea is to see if you can connect your, let's say, your connected bike into your Service Mesh and talk to it when you go to work in order to configure your various Service Mesh. I don't seem so excited. But I actually talked to you about the study we did about Service Mesh. So basically, this is the conclusion. Service Mesh is not a solution. Thank you. Feedbacks appreciated. We are hiring, by the way. Thank you. I saw some up in the eyes of my colleague in the front. We're thinking, yeah, we can go to lunch early. But I owe it to you to dive in more on that. So why is Service Mesh not the best solution? We'll see a bit about that later. First, a few words about the Catholic. So the Catholic is the world's largest support-good retailer in the world. We had $15 billion in revenue in 2022. And we have around 5,000 teammates, as we call them, working in the IT overall in digital. A few words about our container journey. So we started with containers quite early. We did our first test with Scabrantees in 2017. We put our first Scabrantees in production in 2019. In 2021, we went full public cloud. So we basically shut down our latest data centers in 2021. That was the end of a long journey that started in 2016. And now we're in 2010-24. And in 2023, we did this study about Service Mesh. So we look at why we did this study. We look at what our user needs and who are these users. We'll focus a bit more on Service Mesh to see how it works, what it does. And then we'll focus again on the study itself and these conclusions. So first of all, the context. Before diving into the container context, I want to say a few words about our API management system, because it's quite relevant for the next part of the talk. So basically, our API management system, it's an API gateway like all of us use. The idea is to centralize the exposition and the API calls so you can enforce quality and security. The thing to note here is that the architecture of our current API gateway is that we use big centralized data main, big centralized gateways that are, let's say, continental. So we have one big gateway in Europe, one big gateway in Asia, one big gateway in the Americas. And basically, all the API calls, all our APIs are exposed through this API gateway because it's like the golden rule of the Catherine. If you have an API, it must be exposed through the API management system. And so basically, all the API calls go through this gateway, whether they are external or from service to service. So what are the current issues with our communities and our users? Well, as I said, we are using Kubernetes for quite some years now. So we are past the day one and day two issues. And our users are starting to have more issues related to the business. So basically, this issue is all into costs. In the Catherine, we have, let's say, it's directly a lot of small clusters. So this includes some costs because there are a lot of clusters to manage. And well, you have to pay for them. This is changing, by the way, but still, we have a lot of clusters with couple of services. The other thing is related to the way we use the API management system. So you can imagine that having this big centralized gateway, this creates some latencies. Because if you have a service that you want to talk to another service, that is in the same cluster or in a cluster nearby, it has to go out to the gateway. And then the gateway will reach the other service. So this includes latencies, of course, but also costs. And for the tricky ones that don't use the API management system, well, they are in the pickle because, well, they don't have a lot of observability because observing network inside communities, inside vanilla communities cluster, well, it's quite opaque. So they don't benefit from the metrics and such from the API management system. So if we take a step back, we are actually looking at east-west traffic because we are looking at how we can optimize service-to-service communication. So, well, east-west traffic. So let's see now where the users that express these kind of issues. These are the domains that reached out to us and said, yeah, we are having these issues about latencies, about costs. And if you look at them, you can imagine that for the Catholic, there are not some small group of users in the corner of the Catholic, coding in the dark. These are basically some of our biggest internal users. And they represent a large portion of our IT users and also represent a large portion of the Catholic revenue. So we really had to address their issues and their needs. So we have these users. They have, well, business goals, you know, like every one of us, the goals link to improve performances or to solve issue rasters, et cetera, or to reduce and improve costs. And we have this context how to improve service-to-service communication and east-west traffic management. And if you combine this, you can deduce some technical needs on which you can act on to try to solve the business issues for the users. And, well, these technical needs, they are these ones. And multi-cluster, dynamic routing, east-west filtering, and more observability. And if you know a bit the Kubernetes and cognitive ecosystem, you'll think service mesh, right? This must be the answer. Well, service mesh is not the best solution. But what is service mesh anyway? So service mesh is like an additional network layer, an additional communication layer inside your community clusters that sit on top of the other network layer that already exists. So you have your group provider network layer, you have your community sovereign network, and often some networks in between, and then you add yet another network layer. And all this network layer works is that without service mesh, your pod communicate over a network of the community clusters. And on the most common implementation of service mesh, all the traffic goes to a cycle proxy that are deployed inside each pod. So now all the traffic going in and out of the pods go through the proxy, which is a layer seven proxy. And then you can do a lot of things with this layer seven proxy. And you also have the control plane, the control plane pushes configuration to these proxies to basically tell them what they need to do with the traffic they receive. So a layer seven proxy that manage all your traffic, this means a lot of functionality. And this is really one of the main selling points of service mesh is that you can do a lot and you can offload capabilities or functionalities from the code. The most simple and common example is TLS. If you ask your developers to implement TLS encryption between services, it's not an easy feat because they have to connect to PKI, they have to manage certificate rotations, certificate revocation and so on. In service mesh it's quite easy. It's one of the basic feature. It relies on the proxies. And basically you say enable mutual TLS inside your mesh, true, push the config and then boom. All the traffic inside your mesh is TLS encrypted because all the proxies have a TLS certificate and they use that to encrypt the communication between the proxies. So you can imagine there are, as I said, a lot of functionalities with service mesh. I divided these functionalities into four big categories or domains and what do you think these categories or domains are? You can shout out, read it out. Security, yeah. Sorry, observability, yeah. Matrix, yeah, observability. Discovery, yeah, it's, yeah, yeah. Yeah, good. So, you basically have most of all. First is, well, traffic management. Layer seven proxy watches all the traffic so you can act on the e-eaters, act on the requests, et cetera. Security, of course. I was touching on that with a mutual TLS so each pod knows as an identity so you can do authentication, authorization, et cetera, filtering. Next is observability. Once again, a layer seven proxy so you can watch all the traffic that goes by, that goes through, sorry, and extract number requests, extract latencies, and so on. And the last one I call it topology. It is basically the ability to expand a mesh across communities clusters or across virtual machines or other workloads and this way a service in a cluster can communicate with a service in another cluster, completely transparently, as if they were in the same cluster and the mesh handles all the underlying things. So, next question for you. What do you think the benefits of all these functionalities are when using service mesh? Well, this is the benefits we found during our first assessment so it's basically based on the various topics I was talking about earlier. First is simplify the developer experience by offering some capabilities from the code. Next is better visibility, so observability in scenario traffic. As I said, in a basic communities cluster, networking is quite opaque without any additional tool. Secure communication between services. Of course, the security and TLS, this is service mesh is the way if you want to implement zero trust, for instance. And functionally, it can ease multi-tenancies with all that, so it gives you more tool and more solution to easily locate services in bigger communities clusters. And, well, pragmatically, it's service mesh is one tool that packages a lot of functionalities. So, it's one tool you have to maintain and it offers you a lot of possibilities. Of course, there are drawbacks with service mesh. The main one is its operational complexity. As I said, another network layer, a lot of proxies with each pod and you have to manage that, operate that, maintain, upgrade, et cetera, et cetera. And it can be quite tricky. When it works, it works. And when it doesn't, well, you can't lost. You maybe can't lost. Satcar, a cool feature to have a proxy, but, well, it has its own shortcomings. It consumes resources. It has tenancies and ops in your traffic. The last two ones are more linked to the DeKathlin context. Basically, service meshes advertise like one click observability and dashboards. Well, this often implies having other monitoring tools like Prometheus, Jagger, Zipkey, et cetera. And if you don't want to install them and manage them, then, well, it takes more work to integrate maybe in your existing tooling or existing solutions. And the last one is functional because there's a new redundancy that appears between the API management system, the API gateways, and the service mesh for Narsal Traffic because where do you implement your routing? Where do you implement your filtering? And you can even take into consideration the ingress if you have this additional ingress layer. So, Satcar is a big pain point of service mesh, right? Consumers a lot of resources. It has a lot of drawbacks and shortcomings. And service mesh developers and service mesh solution, they thought about this and they came up with some solution in order to remove this sidecar and ease the operation on the service mesh. And the idea is to replace one proxy with two proxies. So, yeah, simplicity, right? Well, it's simpler anyway because they are not at the same level. And the idea is to have a layer four proxy that will handle all the layer four traffic, basically all the traffic of the mesh and apply layer four policies and an additional optional layer seven proxy that would handle all the layer seven policies inside the mesh. And by using, by being optional, by making the layer seven proxy optional, you can basically have better performances because you don't go through the additional layer seven proxy if your traffic don't need it. There are currently two implementation of this model. The first one is ECOS ambient mesh. So, in the ambient mesh, you have layer four proxies on each host that are called zero trust tunnel or Z tunnel. They're basically established tunnel between the host. So, all the traffic of the service page go through this Z tunnel proxies. And you have a layer seven proxies that are called waypoint proxies that are basically Android proxies. And the thing is that there is one proxy, one waypoint proxies per identity. So, basically per service account. And the idea is that if you have a pod with a service account that wants to talk to another pod with another service account, well, your traffic will go through this waypoint proxy. And if you have additional layer seven policies, it will also go through this proxy. The other implementation is Cilium service mesh. So, once again, you have tunnel between the hosts. These are IPsec or wire guard. And the layer four policies are handled by the Cilium CNI and EBPF. And for the layer seven policies, you have an Android proxy on each host. The main difference between Cilium and ambient mesh is that the layer seven proxy in Cilium is multi-tunned because there is only one per host. So, it handles all the identities of the tenants. I also have to mention that this implementation, I quite knew, ambient mesh is still experimental. It's going to beta soon. And I think Cilium service mesh is stable, but well, it's still quite new. But if you're looking at service mesh in a couple of months or maybe a year from now, you should have a look at that. It will not be your choice, but well, more choice is good, right? All right, let's go back to our study at the Kaplan and the conclusions. So, the goal of the study was really to assess if service mesh was the solution to our user needs. It was not to choose the best service mesh solution or choose the best implementation of service mesh. It was really to see if service mesh was good enough for us to use it every day and deploy it at scale. And we conducted the studies quite simply, actually. So, we took the needs from our users. We translated them into scenarios or user stories, whatever you call them. Some were required because they answered the needs of our customers and some were optional because a lot of functionalities were service mesh. So, opportunistically, take some advantage of these additional capabilities. This is an example of one of these scenarios. This is basically the scenario that says that we want service mesh to be able to be expanded between Kubernetes clusters to handle service-to-service communication between clusters for fellow over, et cetera, et cetera. We also took a hard look at the service mesh ecosystem. So, the idea was to have the broadest overview of the service mesh market. So, we looked at open source solutions. We looked at commercial solutions. We looked at various implementations of service mesh, the Serium one. And the idea was to see oh, each implementation or each kind of solution would react, would be used with these various scenarios. So, we took that. We went back to our users that expressed their needs and their issues. And we told them, okay, we may have a solution to what you're facing, but we need you to test it. And so, the idea was to test it inside some of our users' environment, development environment, or environment built on purpose, but with our users' applications to see all the solutions would actually answer and operate them. So, we took all the tests. We aggregated the data. We reviewed it. And we asked ourselves, well, is service mesh the solution? Do we need to put it everywhere? And before answering that question, I want to go back to our old little friend, which is the API Gateway and API Management System, because when our users expressed these needs, our API Management Team, they didn't sit back and relax. They also heard them and they got to work. And they came up with this. And this is, let's say, a new design pattern of our API Management System or for API Gateways. And the idea is to deploy an API Gateway as close as possible to the user's workloads, directly inside the Kubernetes clusters. And what's cool is that the users would use the same control plane, so same interface and APIs that they were using to configure the centralized data and centralized gateways. And this way, this would alleviate the cost and the latencies that were occurred by going out to the centralized big gateways. So basically now we had two solutions, which is pretty cool because sometimes you have none. And well, it's not as easy. So if we take back the needs I was talking about earlier, can service mesh do multi-cluster? Yes, of course. It's one of the nice feature of service mesh. Can API Gateway, can the micro-gateway, and the nickname for this design pattern is still micro-gateway because, well, you take a gateway with a smaller configuration and only the configuration that is relevant to the context where it is deployed. So can the micro-gateway do multi-cluster? Yes, ish, because the idea here is that when a service requests another service, it just requests its internal, let's say, or neighbor micro-gateway. And the micro-gateway and those contacting the service may be external or inside the same clusters. So all the management of reaching and authenticating to the services is handled by the micro-gateway. So yes, ish. Can service mesh do dynamic routing? Of course, layer seven proxies, et cetera. Can the API gateway do dynamic routing? Of course, observability. Once again, both do it well. They use proxies, layer seven proxies, after all. And last one, east-west filtering. Of course, once again, both do it. So we basically have two solutions that Ansari needs. One is one new cool solution, one new cool tool, which is service mesh that will have to deploy everywhere on the cathode, and one is the micro-gateway, which is a nice new design pattern of a tool that we already know and use. So that's basically why service mesh is not the best solution. Actually, the real answer is service mesh was not the best solution for the cathode, because we had another solution that was just taking something that already existed and adapting it to new needs and new usage. And even if service mesh was not the best solution for the cathode, it may be a good solution for you, because basically you have other needs, you have other architectures in place, you have another kind of infrastructure. So you have to look into service mesh if you have needs around dynamic routing, zero thrust, et cetera, et cetera, because it may be a good solution. But always go back to your user needs, that's the main point of the talk, is to always go back to their user needs and not implement a complex solution just for the sake of it, and because it looks cool, because in the end, you will have to maintain it, operate it, et cetera, et cetera. And that concludes my talk. Thank you. I guess we have time for questions. Oh, hi. Well, first, congratulations for the talk, it's bold. I mean, I was not expecting the final result, but usually when you think about putting something like service mesh, it's because you are also sometimes foreseen something that you may use later. And in our case, for example, we use because we wanted to build almost like a platform on top of service mesh and so on. So what kind of consideration in the future of the Catalan needs and so on took place in your decision in not to use? I mean, you think as something that is already established and you, whatever you have with the solution, it's okay. We took a look at what we could do with service mesh in the future. And that's what we had some optional scenarios that we want to test service mesh against. And in the end, we, well, we didn't dim the service mesh necessary even for those because they were not, you know, we wouldn't have to take this into account right now. So it would be maybe a few months from now, maybe a year from now. And well, in a year from now, lots can happen and a lot will evolve in our ecosystem. So as there were no big, big priority subject to end up or tackle with service mesh, we said, well, let's for now just say use the micro gateway and use service mesh for other needs. Well, marginal needs. So we didn't forbid the use of service mesh. We said, yeah, first look at the micro gateway pattern because it would answer 80, 90% of your needs. And then if you want to implement zero trust because you want to be ahead, well, use a service mesh solution and we will use only a, we'll select a service mesh solution for on the cathlon and we'll go from there. And the idea is to revisit the needs in a one year, two year from now and see if this decision still holds. Hi, thanks for the talk. So you decided to choose basically to expand on your existing solution which was the API gateway, but make it in the smaller units and have it aggregate them at high level. How, so how far were you from still choosing service mesh and like in what world, what in what event would you not have chosen to use your existing solution? What drove you to actually do that and was this choice difficult? The way it went is that we basically did the study of about service mesh and our API management team and they came in at some point and said, hey, look, we have this new cool pattern and we should test it. And basically our users went for it because well, it was easier for them to implement this new pattern because it was a tool they use every day and they have a lot of experience with it and it was in the end an easy decision to make because for service mesh, they would have to learn a new tool which is a complex one. We would have to then implement it at scale because well, we use one solution and we would deploy it, we would standardize it, automate it and it would take a lot of work for their teams and for our teams to implement it. And that's why, as I said, we went for the micro gateway pattern and said, go for that for one year, two years and maybe revisit the decisions, see if users have new needs or if the pattern doesn't answer all their needs and then we'll maybe look at service mesh again and maybe implement it. Yeah, so basically it's because user adoption was natural that went for it. Yeah, all right, exactly. Thank you. How did management react when you told them you spent all this time and money on service mesh research to go back to the... I have my manager here, so I cannot answer that. No, I have to answer those kind of questions. No, no. Well, basically the idea was it's two-fold. First is we did this study, we did it well, so the decision, we wouldn't go back to the decision in two months from now because one user would say, hey, look, there's new service mesh school and no, it's a decision for one year or two years maybe more. So it was good for them because there is a decision and we wouldn't come back to it and the other is, well, it's just the user needs are what's most important. So we took a couple of months to do a certain service mesh, it doesn't fit the needs. Well, now we know more about service mesh and well, maybe we'll go back to it but the most important things is looking to your user needs. Hi, I had a feeling that cross zone traffic has never been a challenge for you because I thought you are not paying for that but saying goodbye to protocol, layer seven protocol and using layer four, did it help you to reduce the traffic overhead and do you have a figure in how much did you save the traffic but not using layer seven protocol? Well, we are using layer seven protocol actually because we are using the micro gateway, API gateway, so it's a layer seven. Okay. So there are actually, the savings are more on, you know, aggressive traffic and inter-regional traffic and it covers all the traffic and the idea was to, a lot of services are collocated on clusters and even more will be. So we'll keep the traffic local to the cluster and so we'll have on the bandwidth on the code cost of going back and forth between the centralized API gateway that are in other GCP or AWS projects or other regions and so on. Thank you everyone. I got one question, sorry. In your test, what's the point you're looking to site KLS model? I'm in a model where we're using service mesh and we need layer seven for high use case. Do you see any benefits in going to site KLS model while still using all the waypoints in issue ambiance or was it pointless in this case? Well, I talked about the site KLS approach because I wanted to look a bit into the future. We didn't really take into consideration site KLS solution into the study because we thought there were not mature enough. The advantage I see anyway is that you to remove the site car container and the site car container, which is a proxy for service mesh, it's quite hassle because you may have some ordering problems and always in it containers and so on consumes a lot of resources. So even if in a site KLS approach, if you don't, if you always use layer seven policies and they always go through the layer seven proxies, you can save on the operational costs anyway and on the cost of, let's say, the communities costors because there's less resource consumptions and it's simpler to maybe debug because you have one proxy per host or one proxy per namespace or service accounts. So it's a simpler approach even if you don't maybe gain much on the latencies and so on. Thanks everyone.