 Thank you, good morning. We are really proud of being here. We want to thank the organization for choosing our talk. Be creative, independent, and take responsibility for your actions. This was one of the 31 standards that Adidas learned, the founder of Adidas, defined during the last century for his shoe making company. And this is one of the standards that we try to follow daily when we work, among others. And yes, this is also something very true when we have been integrating Lingardy in the organization. We will show you how we've done this integration at a minimal cost. So just to show you a little bit about the history of how we have reached to this point, we, well, before 2016, the e-commerce site from Adidas was generating less than 1,000 million euros yearly. Right now, we are over 4,000 million. And yeah, if you want to know a little bit about the history and how we have started our journey, you can see the talk which is below. You can download the slides and take a look. It's very funny. It was done in Kupkon in Copenhagen. OK, but there are two, let's say, main drivers that made this change possible. One of them was the foundation of the platform engineering department. And the other one was the adoption of Kubernetes. And day by day, we are learning along this journey. But what is platform engineering? So the way that we have established platform engineering in Adidas is that we have not given any single golden-based path to the developers. We have really a multiple set of applications in the Adidas landscape. So we decided to build it as different building blocks. So we have different teams. Every team provides a certain amount of tools for the applications. And then they can pick and choose whatever makes sense for their purpose. So imagine that we have application around. They want to deploy it. They could do it or not. But let's say that they deploy it in the Container Orchestrations team clusters, OK? And then it's working during Kubernetes. But they might need some additional features. They might need a database. So they go to another team, which is called the landing zone team. And they request an AWS account so they can set up a database. They can go to the FAS data team. And then they can get a Kafka topic to push some data there. But then they are blind. And they would need to see something about the application. They need some observability. So they can go there. And then they can play with all their monitoring tools. So they can push, for instance, all their metrics into Victoria metrics. And they can see them on Grafana. They can push all their logs into Open Search. So as I said, they can pick and choose. They need some other tools to build the applications, to secure them, to create the certificates, or to store some secrets. And at the end, if they want to publish the application towards the internet, we have con as an API gateway. And this is the way that we have more tools. I mean, it's not a broad selection like the CNCF landscape. But it's quite big. But they can select what they need. The way that we have integrated Linkardy. So it's this first thing that we made was to decide that we wanted a different identity trust domain for each cluster. So every single cluster have their own domain. We have also different setup for core DNS. So everything is discoverable. And then we also have our own CA per cluster. We will get now into the detail about that. And also, if you see all the Linkardy documentation about your bring your own whatever, I think we have mostly done everything to adapt it to our own stack. So let's get a little bit into the details. So about the certificate and the identity. So Adidas have a central certificate authority. And from that one, the security team decided to create another one. So they can, let's say, cut the cable if it's a compromise or anything. And these CAs are stored in Benify. Then we have, again, the clusters. We have Linkardy installed there. And to, let's say, create this certificate, we have another tool in the cluster, the external secret operator. And with that one, we have a cluster secret store. What it's doing is defining a way to fetch something, let's say, a secret from Benify, which is going to be used to request a certificate. So basically, this webhook provider is just a way to submit certain credentials to Benify. And with that, we obtain a token. And that token is the one that we use with SerManager. And then SerManager is the one able to issue the certificate for the cluster, which is an elliptic core, which this one is really compatible with Linkardy. At the end, we also have some other automation when we deploy the cluster. So it creates some internal policies on Benify. But at the end, we have, I don't know if it's really visible, but we have different configurations for a cluster in Benify, thanks to all these automation. About the observability, similar to what I said before, we have given a couple of talks if you want to get more details about how our observability platform works. You can see these two docs from ChromeCon in Munich in 2019, or another one from the Kafka Summit two years ago. Basically, again, we have our cluster from container orchestration with Linkardy. Instead of using the default Linkardy piece Prometheus, what we do is, let's say, our own configuration of Prometheus, which scrapes the data plane and also all the control plane. And we have a remote write configuration to push the metrics into one custom component that we have, which is called prompt to Kafka. What it makes is to submit all these metrics into our Kafka platform, into a topic. And then on the other side, on the observability team, there is, let's say, the opposite component, which is called Kafka to Chrome, which reads all the metrics and puts them into Victoria metrics. So it uses Kafka a little bit as a buffer. And then we are able to see in Grafana all the dashboards. At the end, it was as simple as this. We are still deploying the Linkardy base chart, but we are just disabling the default Prometheus, and we are replacing the URL used for, let's say, reading the metrics for the metrics API component. And this is what we get. I mean, we are able to have a central point in Adidas where you can go to these Grafanas and you can just switch the clusters and you have the whole observability of the clusters there. About the traces, it's kind of similar. So again, we have the clusters from our team, the application, which is meshed, and pushes the traces into open telemetry. And the observability team, they are also managing a Kubernetes cluster. So they have set up also some additional infrastructure with some other open telemetry collectors where they have some specific configurations for the tail sampling. They are also injecting some metadata, like, for instance, which is the cluster where the trace is coming from or which application. I mean, they add some metadata, and then it pushes all the traces into Tempo and they are storing as three. Again, with Grafana, we are able to see all the traces from all the applications in the clusters. Okay. So these are the things or these are the components that we have integrated, or this is how we have integrated Lingerty in the company. But we have tried to go a little bit beyond. So we have tried to also give back something to a community. For instance, the integration of the certificates, although it's cryptographically valid to have a trust chain of RSA certificates, and then having one elective code one. The Lingerty check CLI was not able to identify this as a valid setup. So we made a patch for Lingerty CLI and we submitted a PR just to allow this setup. So now every single company that has a setup like this one, now they can really validate that their Lingerty installation is fine. Related to Lingerty Bs, so we have multi-tenant clusters. So for instance, we have one really big cluster for development and most of the applications and we can have more than 1000 users using that single cluster for deploying all their development versions. We have a quite strict multi-tenant configuration. And for instance, if people was trying to use Lingerty Bs to get some data from the CLI, it was not really possible because it was asking for excessive permissions for that. So we also were working with the Lingerty team to reduce the surface of the permissions that were requested. So it's still secure, but it's enough to provide all these information to the teams on a multi-tenant setup. And also, this is pending to submit the PR, but we have also some changes for VZert which is also another CLI that is used with Benify to configure the policies and everything. So the thing is that it was unable to do certain operations. With these changes now, you are able to from the CLI also create these policies and store correctly the certificates and all these have been contributed back to the community. So my take on this is that please, I mean, don't be shy and share your ideas. I mean, if you have issues with any open source project, especially with Lingerty, open issues on their GitHub, even proposed changes, if you have new ideas or start discussions, you can also join Slack, try to help people. You can even just share your experience about how it went integrating Lingerty. Give back your code. So the same way that we have been working to contribute with some PRs back, try to do it. If you are not allowed, if you don't have time, whatever reason, maybe you can also add comments because they can be also really valuable. And finally, spread the word. I mean, everybody wants to hear your story. We have learned that with the talks that I've shown. I hope that you are also seeing this one interesting, but especially if you have failure stories, people is really willing to hear that. And then I hand over to Miguel. Good morning, everyone. I'm Miguel, I'm a SREA at Adidas. I'm in a team that is called Commerce Next. And we are the next iteration of e-commerce services in Adidas. We are using this opportunity to provide core services to make it easier, ditch legacy code and, you know, just a fresh new look at how can we do everything. So why did we choose to be the test pilot for LinkedIn and Adidas? The main reason is we are so new, we still don't have every service in production, right? So we have this 50-50 or 70-30 services in production. So in the 30 that are in production, we have some leeway and we have some playground that we can do stuff with. And the reasons that we decide to go with LinkedIn is because the development moved fast and we have some gaps. Mainly we have some observability gaps. The tracing was that way, you know, everything that we did, every Grafana panel, since we did manual instrumentation of the applications, every Grafana panel was custom made for every application. So we couldn't reuse those. And then we have, because some technical issues, we have a very convoluted authentication stack that we want to improve, right? So how did we adopt LinkedIn? So we had a quick meeting between six people, right? And we set a very simple target. In one quarter, we should have at least one service with LinkedIn enabling production. And it has to be a production service, right? So we took one, some, several and all services approach. This isn't all microservices or applications in a data production, right? I'm talking just about the commerce next services. And we had this approach for every one of our depth staging and production environment, right? I have to say that we silently missed the target. We didn't do it in one quarter. And we will see why later. And the operationalization of everything, it wasn't included in the scope, right? It was our best effort. We wanted to do it as best as possible, but it wasn't something that we included fully. It was a small team, two people, and we weren't fully dedicated to the project. Our main mission in this quarter was business as usual, mainly in our free time or downtime, push a little bit of LinkedIn, keeping in mind that one production service in one quarter was the target. And from the application perspective, it was completely painless. We only had to add that little annotation there and everything just worked. Obviously, after the work that Danny did, right? But it was easy. So obviously we found some blocking issues. Mainly the main problem or the major problem that we found is friction and internal corporate processes. Adidas is not a small company, but it was pretty quick. But still we have to comply with security. We have to get approval from the right people. I would say that that was the thing that slow us down the most, right? Then the team wasn't fully dedicated. Again, business as usual. We spent around 10, 50% of their time during this quarter for LinkedIn. The other issue is that we moved too fast because we have three different environments and in one quarter, that will be one environment per quarter. So imagine that you have a critical piece of software that you are going to deploy worldwide in everything and then you have one month to test it in depth and even if you are happy or not, you should move to staging to keep up with the timeline. That is, in retrospective, that is too fast. Maybe that's the reason that we incur in that slight delay, right? And then we have two failures from LinkedIn. The proxy wouldn't start and this was due to two upgrades that we did to the clusters. The only problem it was, it was the worst timing ever. The first one was when I was trying to push LinkedIn to other team. And the second time was when the other team came back to me trying to adopt LinkedIn. So really, really bad timing. We still have to do some failure analysis. We have some bullet points that we would like to explore and see how an hypothetical failure of LinkedIn will affect us. So still some homework to do. So once we got LinkedIn production, what did we get? Well, we got improved observability because we are moving away from the application, graphana panels into LinkedIn panels. So that allowed us to instantly reduce panels so we don't have to work on a single panel to improve observability in all our applications. Now we can do that. We have some performance improvements. They are not that big because mainly in the backend area I'm working with, the requests are very short-lived API requests. So we have seen some performance improvement but it's not by a lot of margin, right? And then we have a lot of integrations already done. So with the metrics, which is graphana panels that includes the per route metrics which we were manually instrumenting, we suddenly get the new alert team for every application. So we don't have to have a set of alerts for every application. And then we have retries. Retries for us are important. In particular, one of the applications we have right now in production is having some issues with a third-party provider. This is in the final checkout for the final customer. So having those failures are impacting Adidas bottom line directly. And it's jumping around. OK. Now it's not. Now it's back. OK. So the retries. It was impacting our bottom line and we had two options, right? We could use LinkerD that we already had in place to implement the retry there. Or we can modify the application and implement the retry there. We went with LinkerD and we actually saw a reduction on failed requests for the. It was a critical path, or it is a critical path in the user journey. We have some things that we have pending. Again, 1 quarter, 10% of the time. But we want to explore for their cross-cluster communications, the access control list for LinkerD, and traffic splits. We have some homework that we still have to do. We still don't have the cluster ingress mesh. So there is some lack of observability there. Circuit breaking and dynamic routing. That is something that we were hoping we could get soon. And in the latest release of LinkerD, we got it. We haven't had time to test yet. But we are excited to see what we can do with that. And then we have a little wish list of things that we would like to see improved. So real limiting, especially external services observability, the third party that we depend on, they are currently in no way to mesh them in an easy way. So in summary, what did we have? So if you have two people and you can spare them for x amount of week, you can get a separate mesh in production, which is great. Around one quarter, but then business as usual. We instantly had our observability gaps completely filled, closed. And then we are able to reuse our work, which was a huge investment in time for every panel, one person doing it. It's a lot of redo the work. Then we have already the knowledge or infrastructure pieces in place. So now with the new release, we can do circuit breaking. With the traffic splits, we can improve our chase engineering. Then once we have a little bit more experience with LinkerD, we have cross cluster failovers, which will improve our resiliency. We experiment some cost reduction because we are getting better performance. Hopefully once we implement the ACLs, we don't have to make a little bit of a round trip that we are doing right now for the authentication stack to work. And then by default, with the new TLS, we improve our security stance inside the cluster, which is something that we had to do. But now with LinkerD, everything is done for us. So I think that's great. And I think that's it. Thank you. Thank you. Does anybody have any questions? Right after this, we will have a little bit of a break. But if you have questions, we will do our best to get to them. So hi there. My name is Victor Stajak. And I would like to ask if you do have any kind of statistics of how was the performance impact when you introduced LinkerD? Because you have pretty large scale and how your CPU usage raised and maybe memory. So that would be nice to know. So regarding the performance, we only were worried about the resource consumption. Because we are adding a little proxy to every service. That was what worried us the most. In our initial test, the latency induced by the proxy was pretty minimal. We didn't have a comparison with other mesh providers because we are not going to deploy everything in production. But the latency was around 2, 3 milliseconds extra or something like that. Keep in mind that the area I'm working on is basically the back end of Adidas. So a lot of our calls are inside the cluster. And we are not the huge part of latency that we get when we have a call. It's usually either DB queries or third-party queries. So for us, a few milliseconds here and there outweighs or the benefits of having LinkerD outweighs the few milliseconds here and that we are getting in latency. What was the total RPS? It really depends. It really depends. All right, I'll put you on the spot. It depends on you have a ballpark or it depends on you really have no idea? No, the thing is, as I said before, we have multi-tent clusters also for production. So it really depends on the marketing campaigns that are being run, depending on the each country. I don't know if you are aware about the, let's say, the different hype drops that we have. So we have special campaigns which are really special. I mean, a lot of people want to buy those shoes or those articles. And then we have kind of a Tito's attack on the cluster. So it really depends on the moment of the day that we can have. I don't have also the exact numbers of the normal requests, sorry. Thank you. Anything else? Yep, one moment. Thanks, first of all. I was wondering what kind of benefits Lincardie provided you in the observability stack. So is it only or is the only benefit that you can have a more general or more over-templated panel, if you will, or is there also a benefits in the detail of the metrics you can maybe show? So regarding the observability stack, we had two main issues, right? The first one was we are using a diverse set of languages. We are using Java, Node.js, a lot of different things, right? And every instrumentation we do in those applications, they usually have their leader format for metrics. So yes, we did have the metrics. But the metrics were named different things, like application name and then bucket this, that. The buckets were configured differently. So we had a lot of different buckets or we have to manually tweak. So that includes a lot of manual work that we have to do for every application. From the dev perspective, it's not that bad, because, you know, oh, I already have metrics. It works. But from my team, there's every team that's an additional toil that is needed at all, right? So with Lincardie, what we are doing is for critical applications, we still have this manual instrumentation in some paths that are critical to the application. And we want those particular metrics, right? But if you have a new application and just enable Lincardie, you just suddenly get 70% of those observability with the default Graphana panels that we are developing, right? Because we are using the metrics provided by Lincardie for upstream and downstream. Then we have the metrics provided by the cluster for performance. And then the basic, I don't know, pent-elementary metrics or whatever is said that you are using, right? It will provide a coverage of around 95%, let's say. So that is the biggest improvement to us. We don't have to redo work again and again and again for the verification. The other option that we had was rewrite the metrics in Prometheus, but that includes another manual step that you now have to maintain, right? And a new application we have to go through every step on the way again. So it's again, it's manual toil, manual work that shouldn't be done in 2023. All right, thanks very much.