 Like Lynn said, I'm going to talk today about service mesh and tendency and some of the problems that we've run into, where I work here at Sola, where we work with people adopting service mesh, and we'll look at some practical examples. I'll try to mix in some demos and then we'll look at some of the things that we've done as solutions that may be interesting to you all. So like Lynn said, my name is Christian, a global field CTO at Sola. I am super excited that I finally got, me and Renor got Istio and Action out the door. That took three and a half years. There's I think close to 500 pages of Istio knowledge in there accumulated over that time and it went to print back in March. We are doing a Meet the Author. I thought it was a book signing, but it's a, well, we'll give away digital copies of the book, but you can also request physical copies and we can send them out afterward. So at the Sola Con, the Sola Service Mesh Con booth at 510, come stop by. So I work at Sola. You've probably seen a few of us speaking here today. We work on solving application networking problems, how do services connect, and how do we drive policy for those services across, you know, anywhere where workloads might be deployed in Kubernetes, on VMs, across Kubernetes clusters, across private and public clouds. And we bring a lot of interesting open source technology to those solutions. So Service Mesh is a big part of it, Istio is a big part of it, you know, EVPF, which we used to help optimize the way the Service Mesh runs. And now, if you caught Idit's announcements earlier at the keynote, you know, bringing in Cilium and Layer 3, Layer 4 to complement that solution. If you're interested in these technologies, interested in being exposed to a lot of the customers and use cases that we see and the solutions that we push forward, we're hiring. So definitely please reach out. You may recognize some of us, not only having spoken here but being leaders and contributors, maintainers in the open source projects, Istio specifically, you know, and we bring a lot of that expertise to the people that we hire, so it's a good opportunity to learn as well as to our customers. I'll speak through some of these, I know some of the other sessions showed that. But getting to the main point, the main reason for this talk, as I was saying, we work with a lot of organizations that are adopting Service Mesh. And some of these organizations are the largest deployments in the world or the most complicated or approaching the most complicated because of the types of organizations that are adopting them. These are, you know, financial services companies or insurance companies, retail companies. A lot of them that have been around for a really long time. They have ways of, you know, how they run their business. They have organizational policies that you would have never guessed would make sense just by working in the open source. You have to actually go see and experience it for yourself, really. And so one of the things that you see that comes up almost every single time is when you are adopting this new technology, when you're modernized, you're bringing in containers, you're bringing in some of these cloud solutions, how do you expose these to the teams? To the developer teams? How do you integrate this in your platform? And tendency is something that frequently comes up, which is what we'll be talking about today. I was having an interesting discussion with a customer not that long ago. And they said they don't have this particular one, which I would have assumed of any of the customers we talked with would have a big tendency problem. They said, no, we don't have a tendency problem. Because we run a single cluster per application. And I was thinking, well, they must have had some tendency problem, otherwise they wouldn't have been running in that mode. But then I just threw this out there on Twitter and got a really, really good response to it. And you could see people were adopting and deploying Kubernetes clusters specifically in the way they looked at tendency in all of the different extremes that you could possibly imagine. One, some people responded, oh yeah, we run massive clusters, open shift clusters, or just whatever Kubernetes clusters. And they went into some of the details about the tendency problems there. And even responded that they run a single pod per cluster, which I thought hopefully was not real. But nevertheless, go through this thread. There's some interesting things there. But really the things that stood out, not only in that thread, but generally when we're talking about tendency and why it gets hard, involves a lot of different facets. It involves the infrastructure, of course. The way the teams are already organized and already working, and that varies between organizations. How you isolate the impact of one team doing something to disrupt another team, and that itself can be at multiple levels. But today we're going to talk about when you adopt a service mesh and you want to expose the capabilities of a service mesh, which are in some cases pretty broad and very powerful. How do you do that across these various teams? What mechanisms might already exist in the mesh that you might be looking at? Now, of course, I've been involved with Istio for a long time. I'll be looking at the lens of Istio. But the questions and the scenarios aren't tied to Istio specifically. So let's take a very simple example where you have a set of applications that you want to just expose for consumption. Istio has an ingress gateway, which not every service mesh has, but this is actually pretty useful as a way to get started with a service mesh if you're not comfortable deploying sidecar proxies everywhere. But starting with a single proxy, getting traffic into the system, and incrementally adding from there. It's a good place to start. Now, when you have multiple teams that you might be servicing or enabling with a mesh or with a technology like this, you have to take into account, well, I need to expose the service a certain way for this team. And this team actually wants to own more control or have more control over how it exposes its services. In Istio, there's already a split in the API between what happens on the gateway, what happens when we open a port, what protocol is it speaking, what security we might want to associate with it, some simple security. And then another part of the API that specifies traffic routing, and matching, and splitting, and so on. So this is the virtual service in Istio. So you can already see the API in Istio is thinking along the lines of, well, what are these different things that we can compartmentalize and configure? Let's jump real quick to a demo and see if, oh, I'm offline. I did connect to my phone. Let's see. I wasn't expecting to be kicked off, but we're going to jump real quick if we can to a demo where we walk through some of the nuances and then some of the details of Istio's API to get to that sweet spot of being able to enable multiple teams to configure various parts of the mesh. Let's see if you're just starting off, in this case, with a gateway. And this looks like I lost that. So one second. Can you still see this in the back as the text OK? All right. Thanks. All right. We'll go here and we'll go here. All right. So the first thing we're going to do is we're going to take a look at, like I mentioned the gateway API and the virtual service API. I'm also, this is a live demo. I clearly didn't type that out. I had it scripted, but this is a live demo. I'm usually terrible at typing live, so you're going to see the script here. But the first thing to notice is that we specify the gateway. It's pretty straightforward. Specified some security properties about it, the port we want to open and the protocol. We've also specified a host name where we want to match on traffic and then eventually delegate that to something else that will specify the routing rules, which is the virtual service. All right. So if you're familiar with this, it's pretty straightforward so far. We just want to expose a service on the gateway. You'll notice a couple of things here, though. The first is in the gateway resource, we specify we want matching to happen, but not only for rules that appear in a certain namespace. The second thing we'll notice is we can, from that matching rule, say, well, for these rules, I want these to be applied to a certain gateway. So there's already things, there's already mechanisms in Istio to allow you to kind of split up and isolate exactly where the configuration should be coming from. So this is already applied, let's actually call it. You should be able to see it. Okay, sorry, it's at the bottom there, but you should see the request actually goes through. And the next thing we want to take a look at is, well, what if there's a scenario where one of the teams wants to own a little bit more of the traffic rules, but a platform team or the team that owns the gateway, they can't just hand off everything to the end user team. Maybe they need to control things like rate limiting or they need to control things like external off, but for specific regexes and how it gets to a specific service and how you split the traffic, we can delegate that. So Istio does have a delegation API. So if you were to look in the virtual service here, the reference docs here goes into the details of that. So let's actually try that. We're going to expose a service, a new service that hasn't been exposed on the gateway yet, using the gateway API and we're going to delegate traffic routing rules to the ratings team. And the way we do that, let's apply the gateway real quick, the way we do that is by specifying that we're going to delegate the rules. We'll control the top level domain matching. We can add additional capabilities there that the end user team shouldn't be configuring. And then we can delegate the traffic routing rules to the team. So let's do that or take a look at the, this is what the delegate route looks like. We can see how they're connected right here. So let's apply that. So we'll apply both of them. Now let's try to call the service. We should be able to see some response like we did in the previous example, but unfortunately we do not. So Istio does have the ability to delegate the routing table to other teams, but there's some nuance here. So if we take a look at the gateway itself, we see that, like in the previous example, we were actually telling Istio to pull the routing rules from a specific namespace. In this case, we don't want to let the ratings team decide what all of the routing rules are. So we're only delegating part of it. The routing rule, the matching rule for the host name actually lives in the same namespace as this gateway resource, not with the team. So we've got to understand that and make sure that we configure things correctly once we start going down the path of delegation. So in this case, we removed the namespace semantics from the matching and now let's apply it and now if we make the call, it should go through and it does. Now again, go to the docs, take a look at reference configuration, traffic management, go into the gateway and if we take a look at those fields, we'll go into some detail about how to specify or when to specify namespace and when you're supposed to pull the rules a certain way and so on. So that's one component too. Now if we think about splitting out traffic rules, we split out what configuration belongs to what team, this is kind of one of the first things that you can start to look at. Now if we take and expand this a little bit more, we looked at sort of traffic coming in, how you start to think about tendency at the edge level. Now when services are communicating with each other and different teams are involved, different teams own some of the different services here, we have to think about what that configuration looks like and what are some of the dynamics that happen there. But before we do that, typically these teams are operating on shared platforms and these platforms are owned by some team, DevOps team, platform team, whatever that are enabling the mesh or exposing the capabilities of the mesh. And this team probably cares about things like security, like life cycle, like high availability of services, failover of services, topology, this type of stuff. Istio's API has a set of components to be able to drive that, but it may overlap in some ways with what the end user teams start to use. And the end user teams, let's say you own the service, in this example this is a recommendation service which is calling another service the purchase history service. So maybe as the developer or owner of the purchase history service, you care about things like how the host name gets matched or cores or traffic splitting, maybe you're introducing a new version or something. So Istio's APIs for doing that are the virtual service, the destination rules, potentially the sidecar resource. Now as the caller of the service, you care about maybe slightly different things, time outs and retries when you're calling the purchase history service. Now Istio again has this API, virtual service, destination rule and so on, and we've seen teams get confused about who should own that resource. Should the team that provides the service own that resource? Or should it be shared somehow and they should run through some pull request mechanism to try to get everybody merged and get all the configs on the same page? And so we see some of the contention that can come up around this API. And we've seen in various ways for teams trying to mitigate this. But what I'm going to go into real quick is a little bit more detail. We're going to go a little a layer lower because Istio's API does allow you to carve up the virtual service and expose certain capabilities for certain teams without impacting each other. So let's take a look at demo two real quick. So we're going to take a look at the... So imagine for a second there is a web API service, ingress traffic comes in and hits the web API service. That service then calls another service, the recommendation service. And that service calls yet another one, the purchase history service. Now by default, when we've deployed this in our cluster, Istio configures the web API service to know about everything. And that might not be what we want, especially if we're thinking about tendency and what should it know or what should it be able to connect to. If we look at the clusters that the web API service knows about, if we come over here and take a look at the command line real quick, we're just asking Istio, just tell me about the clusters or the upstream services that the web API service knows about. We see it knows about a bunch of different things, including purchase history, ratings, recommendation, all this stuff. But web API only calls recommendation. We should configure it so that it only knows about the services in its own tenant space as well as any of the other services that it may need to consume. And so to do that, we're going to use an Istio, the sidecar resource. We're going to trim down the configuration so that it knows only about the services that it needs to. And in this case, we're saying you, the web API service, you use the recommendation service, so you're going to know about that and you're going to know about the Istio control plane and anything else in your namespace in this case, I think. And that's it. So we'll trim down the configuration there. So let's apply this, that got created. Now let's go back and check what are the services upstream that the web API service knows about, it knows about recommendation, and it knows about some other helper services in the control plane and so on. It doesn't know about the whole list, and it doesn't need to. So we've started to kind of trim down and focus what the configuration should be for a particular tenant. Now we're going to run this again for the rest of the teams who own different parts of the application set that we have here, and we'll speed this up a little bit. OK, now the last little bit here that we're going to focus on is when we apply a configuration of virtual service, or a destination rule, or any of the configs where we might need to control the connectivity between services, and one thing I pointed out was the contention that we see between two different teams trying to share a virtual service, for example. But we don't have to share a virtual service. So in Istio we can take a virtual service and we can explicitly say that this configuration, these routing rules only apply for services that live here in this namespace. So maybe this is where the services are actually running. In this case this would be the web API virtual service, maybe we can expose this directly into where the web API service actually runs, if we can put that virtual service there. For clients that care about timeouts, retries, all this other stuff that might be different from what the server is providing, they can create their own virtual service for this and put it in their namespace. And we can control that with the export to stanza in the virtual service. So we can get very fine grain for where Istio's config appears and how it's applied to the end workloads. So let's apply this and we'll apply it to some of the other services here real quick. And we're also going to take a look, I guess I didn't reset the demo. And we're also going to take a look at another configuration in Istio called the service entry. And the service entry allows you to configure access to other services that either live outside of the mesh or to build a globally aware name that can be then used to resolve workloads that might live in the mesh or across clusters. So in this case, we've created a globally aware name called recommendation.istio.io and the mesh then knows how to resolve those endpoints. But we might not want every single tenant or everybody to know or every cluster, especially if you start thinking about this in terms of multi cluster about this recommendation service. We might only want to put this service entry in certain tenants and certain namespaces in certain clusters to control the visibility, even though it's a globally routable service. So that, oh yeah, sorry, the 40 hours is catching up to me right now. So in this case, we explicitly exported the service entry to our web API namespace. And so now the recommendations global recommendation services going to be available and routable from the, you know, the web API namespace in this case. So yes, we should be able to call it and everything should still continue to work. Okay, come back here. So what we just, oops, microphone, what we just saw is, you know, a couple of capabilities, the kind of low level capabilities in Istio for controlling how configuration is exported, how, you know, configuration can be tailored or cut down, how we can do things like delegation to other teams and all of these form the foundation for the ability to build a tendency model on top of the, on top of the service mesh. But if you were following along with me on the demos, it's, it's pretty tedious to try to think, at least for a single team, where does, where does this configuration need to live? Where does it need to go? You know, you need to think about it a little bit. And now you start to expand that you mix delegation, you mix, well, the gateway host matching can deal with namespaces and is that virtual service really there? Is it exported correctly? Now you start to take this across multiple services, hundreds of services across clusters now, trying to manage the tendency model in your head is not ideal, but this, this can get complicated very quickly. And so what, what we work on with our customers at solo and what we've seen people adopt in the community is building blocks, workflows on top of the constructs that Istio gives. Istio's API was intended to be sort of a lower level API with workflows built on, on top of it. And the, the, the tendency construct is one that people have, have built. You know, if you, if you look at the talks from Istio con this year and last year, you know, you see, you see organizations that have talked about the journey build these, these types of constructs. So basically it comes down to who, who, who decides what this workflow is? Where does this live? How do you make it consistent? How do you make sure that there are no errors? You know, this, like I said, this can be pretty tedious and error prone and, and lead to, lead to issues. We've worked on it so long, I think you've seen some of the learnings that we've, we've shared earlier today is that we've, we've sort of combined GitOps with the smart controller that understands higher level policies and it can build a tendency model. It can do the translation to Istio resources. It can set export to and the sidecar correctly and all this stuff so that you're not trying to get it right by hand basically. And so kind of, kind of an overview is we, we cut down instead of saying, here's one API, the virtual service API that does all of this stuff. Why don't we bring the level up a little bit higher and say, hey, you need to focus on, well, I want my service to correctly configure circuit breaking or these resilience pieces, timeouts, retries, whatever. And then I want to use labels and selectors to apply this to a workload, a bunch of, bunch of workloads and entire tenant or entire platform. And so we've broken down the API a little bit more, made it a little bit higher level and then built a tendency model around that. So we call it workspaces, it's basically a bucket or a grouping of service mesh policies that then get translated, come back here, get translated to the Istio resources that correctly handle those lower level tendency constructs. And all of this can be driven by GitOps. Alex did a talk earlier, lightning talk earlier about the controller model and GitOps and how all this can tie in. Go take a look at that. I don't have time for a demo. I really want to leave with links for learning more. We do workshops on this type of material in depth and we offer certifications and so go check out these links and like I said, we're hiring, certainly reach out with questions and I look forward to seeing you at 510 for the book giveaway. Thanks. Awesome. I don't believe I can speak after 24 hours stuck in the plane. Not to mention showing a live demo. Well done, Christian. Thanks.