 Hello, hi, KubeCon. I'll question for all of you that are joining just now. We're going to talk about service mesh specifications. How many of these do you know? Think about it. Service mesh abstractions. Not necessarily service meshes and how many of those that there are. There are quite a few. But how many specifications are you familiar with? Do any come to mind? If so, mention them in the chat. Let's see how many you can prattle off. We'll go through all of them today. So no worries if you're unfamiliar. By the way, my name is Lee Calcote. I'm joining you today from Austin, Texas. I am, I wear a number of hats. I've been focused in this ecosystem for a long time. So one of the hats that I wear is as the co-chair of the CNCS. Special interest group on networking. And all of the CNCF projects that fall within that domain. So, you know, Linker D and core DNS, Envoy, GRPC and service mesh interface and Kuma. And there's a, there's a list. And so we, we, I stay busy with that hat on, but I wear one as a doctor captain and as an author on the subject of service meshes. Authoring a couple of books at the moment. And so we'll see if I make it out alive. I'm not sure we're going to talk about a lot of service mesh things today. If we go too fast or you miss anything. You can see the URL in green and you can have as much slides to your heart's content. I do have a full-time focus within the layer five community and layer five communities, a service mesh, dedicated to service meshes and helping people adopt them and operate them, do so confidently. We're going to talk about some of the projects that the open source contributors work on within that community. That's a warm and welcoming sign in and join up. Another Slack workspace is just what you need. This one isn't quite like the others. The other thing that I'm really fortunate to have today is, is to have a maintainer, an open source maintainer from that community joining and co-presenting work that he's been doing for some time. Cush. Hi all. I'm Cush Thruvidhi. I'm senior year undergraduate at IIT Jodhpur. And I have been focused on service meshes and its specification for around six to seven months. I have been involved in the FI community as a member, as a contributor and as a maintainer. We are doing lots of interesting stuff over there, working on meshery, working on service mesh performance specifications, working on a semi-conformance tool. So yeah, we have quite a few exciting projects. We have a welcoming community. Nice. Good. Well, let's talk about the journey, the cloud native journey to service meshes. I know everyone's, I guess I'll say this with as much sarcasm as I can. Maybe you're tired of hearing about journeys, but rest assured we are all on one. The journey doesn't look quite the same for everyone. I think the further you step back from it, the more of the journeys look the same. For a lot of us that has been with sort of the advent of cloud native has been with the popularization of containers. Thank you, Docker. Thank you so much. You know, Docker was first announced a little over seven and a half years ago now. We saw a number of container runtimes come and a number of container runtimes still here. From there, containers took off like wildfire. We got a lot of them. Turns out you need an orchestrator to help you wrangle that sprawl. We saw a number of orchestrators come. We saw a number of orchestrators around. I don't know if it was today or yesterday, but hashy corpse nomad 1.0 was just announced. And so. So can orchestrators are a thing. And so our service messages. The, if you measure by the, when linker D V one was announced, that was about a little over four years ago. And I was sort of generally available on three years. Three years ago, so service meshes have been a hot topic for the last few years. And they still continue to be rightfully so they're quite a powerful piece of technology. A lot of the power is yet to come from my perspective. Not everyone quite understands the capabilities of meshes as they are promoted and spoken about today. And for my part, I believe that there's a tomorrow in which data plane intelligence really matters and matters about how people. Write cloud native applications. So. So come along on the journey, I suppose is the thing. There are any number of service meshes out there. And as a matter of fact, one of the community projects is to track a landscape of all of the meshes. There are this landscape was just updated today. Actually, it's been a busy day. It's kind of late here at landscape was updated with the engine X service mesh. If you're not familiar with it or missed the announcement a week or a couple of weeks ago. Go get familiar. There's a lot to say about each of these service meshes and how they work. Their architecture. Why they're made what who they're focused on what they do when they came about why some of them aren't here anymore. Why we're still seeing new ones. A lot of things to go through. You'll probably be interested in any number of the details that this landscape tracks. So. Visit it. Use it heavily reference it. If you're not aware. This statement is true. We did pull out a few strengths just as a sampling of. Some of the service meshes that are available network service mesh on the right hand side. It's sort of the service meshes service mesh, if you will. It has a lower layer. And by layer, I mean, oh, I'm on a layer. Focus. It's a highly performant mesh. It. Layer seven layers, like five through seven tend to be more of the focus of the rest of the service meshes. So, um, NSM. Offers some unique value in that regard. The other NSM or engine X service mesh. Well, interoperates with engine X plus. And that's what it uses as its proxy in the data plane. And that's what it does. So it's a really good proposition considering that there's. If I recollect correctly, somewhere north of 400 million deployments of engine X out there. Um, That as people continue on their journey from just using proxies in the standalone environments. And they go to have lots and lots of those proxies and ultimately arrive at a service mesh. this regard. You know, I won't go through the rest. There's a bunch of things for us to talk about in this talk, but go visit the landscape. It'll help you navigate these waters a bit. I'll say that I, for one, am pleased that there are various choices. There's just, there's different needs out there. There's different organizations, different needs. And what we're answering on this slide is actually the first question that I'd ask, which is how many abstractions are there, how many specifications, how many standards are there that have come to the rescue, so to speak, for understanding and interoperating with the various service measures that are there. One of these is SMI. Again, fortunate in many regards, I think, in this space, but I'm a maintainer of SMI and will tell you that, we'll describe SMI more deeply here in a couple of slides, but it's focused on lowest common denominator functionality across service meshes, SMP, service mesh performance, focused on describing and capturing the performance of a service mesh, the overhead, the value is another way of looking at it, characterizing that. Hamlet, or multi-vendor service mesh interoperation, is also focused a bit on, well, it has a couple of APIs that are focused on federation. And I like to think of it as service catalog exchange, but they each are complementary in nature and each address our specification to address a particular area around service meshes. Something that I think isn't always obvious to folks and it is this piece of value that people get from a service mesh and actually from the specifications that we were just mentioning is the fact that teams are decoupled when you're running a mesh. Developers get to iterate a bit independently of operators and so do operators get to make changes to infrastructure to the way that applications behave independent of developers in the presence of a mesh. Both of these teams are significantly empowered. As a matter of fact, there's an alternative version of this slide that talks about the product owner. The product owner is a third persona that is also significantly empowered in the presence of a mesh. There's some advanced use cases there that I've spoken to in a number of other trainings and talks. Catch me in the chat if you want to hear about those use cases. Pretty interesting. So everybody gets a piece of power, I guess, when they deploy a mesh because of, well, the fact that there are many meshes. By the way, I didn't say that there's a little over 20 of them that are being tracked on that landscape. Those abstractions have come along and there's a need for abstractions because it's not necessarily the case that most organizations are running multiple service meshes at the same time. There's the one or two percent or five percent of the world that's in that scenario for any number of different reasons. But by majority that's not necessarily the case. Irrespective of that, infrastructure diversity in the enterprise is a reality. We've had any number of service meshes for the last few years and near as I can tell, we're going to have any number of them for any number of more years. Certainly just like any other market, we'll have some that enjoy much more adoption than others. But we've been as a community out answering a lot of questions about which mesh is best for me. How do I get started? How do I operate these with confidence? How do I ensure if I'm, should I be adopting one of these specifications? How do I ensure if I am that my implementation or the service meshes implementation adheres to the spec? Lots of those questions we try to answer and we work within an open-source project called Meshory. There's a lot of things to talk about about Meshory. We'll show you some of the service mesh specifications using Meshory. So let's take a moment to look at the architecture of Meshory. If you hail from a network engineering background, some of these next terms are going to be quite familiar to you. If you don't, then as long as you've spent some time around Kubernetes, the data plane and control plane terms are probably conceptually somewhat familiar to you and are certainly two of the logical planes that generically comprise a service mesh. There's a third one that Meshory lands into and that's the management plane. There's a lot of comparatives and an analogy to be made here to how physical networks run and these data plane control and management planes that are represented there. As we collectively sort of at least part of us or half of us live in the cloud native ecosystem, we live within a software-defined network landscape and service meshes in some respects are sort of a next-gen SDN. The management planes can do any number of things. They help bridge the divide between other back-end systems and service meshes. They also help do things like performance management, configuration management, maybe making sure that you're doing best practices implementations, be taking common patterns and applying them to your environment. Maybe they're facilitating insertion of WebAssembly modules and filters into your proxies. These things are things that Meshory does. Meshory was also a release partner with SMI and implements SMI. It also is the canonical implementation of S&P. It also supports nine different service meshes today. It has individual adapters written for each of those service meshes. It does leverage SMI, but it also uses different adapters for each service mesh to allow those meshes to expose their differentiated value. Let's take a look at our first specification, SMI, service mesh interface. SMI is born of, well, its goal and its genesis was born inside of Kubernetes, I guess, is maybe one of the ways of articulating it, that its focus is on being a specification that is native to Kubernetes. Its focus is on lowest, I'd said this before, but lowest common denominator functionality or different way of saying that is focusing on bringing forth APIs that highlight and reinforce the most common use cases that service meshes are being used for currently, while leaving space and providing extensibility room for additional APIs to address other service mesh functionality as more people adopt and more people make other use cases well known. There are four specifications today. They're listed here. One that defines traffic splitting and how traffic splitting can work so that you as a consumer or a user of a service mesh can configure and control a service mesh using this standard specification for defining something like traffic splitting, doing a canary and doing that in a universal way or doing that in a standard way and so on for other use cases around access control and metrics and exposing and highlighting the most common metrics that people look at. If you're familiar with Sonaboy from the Kubernetes project or as a related project to Kubernetes, Sonaboy is a utility that's used to identify whether or not a Kubernetes distribution, of which at one point there was near 100, maybe there's over 100 now, whether or not that distribution is in fact Kubernetes, whether or not that piece of software that is labeled with Kubernetes, in fact, adheres to the Kubernetes APIs and like I said, it is Kubernetes. So Sonaboy runs some conformance tests. The same thing is needed for SMI as a standard specification. It needs to define what it means for a service mesh. Right now, I believe there are seven service meshes that claim compatibility with SMI and there's been a community effort, open source effort to create service mesh conformance tests to assert whether or not a given service mesh is compatible with SMI and in order to facilitate those types of tests, you've got to have a tool to, well, provision those seven service meshes to provision a sample application on those service meshes to then generate load and test out whether or not traffic splitting behaves like it should or works with that service meshes implementation, traffic splitting as an example. And so, and then you need to be able to collect the results, guarantee the provenance of those results and publish them. And so, as a community, we turned to Meshry as the tool to implement SMI conformance and we've been working with the individual service meshes to validate their conformance. And so, let's take a moment to do a demo. Let's take a moment to look at what that looks like to validate conformance in SMI using Meshry. So, what I'm going to do here is use Meshry as the tool to validate that conformance will come over and spin up Meshry locally. So, we'll use Meshry CTL as the command line interface to work with Meshry. That command line interface will do a Meshry CTL system start. It'll update the latest and greatest Meshry. Start that up, open up your browser and drop us in here. We can interact with a number of different service meshes. The service mesh that we're going to work with today is open service mesh. Open service mesh is one of those seven that is compatible with SMI. And so, let's put it to the test. Just before we do, let's familiarize with my environment a little bit. I'm on a Mac. I'm running Docker desktop. I'm running Kubernetes inside of Docker desktop. And I should be running a relatively clean system. Yep. That's a fairly fresh install. Just Kubernetes. We are connected. Let's go over to open service mesh and install it. OSM should install fairly quickly. As that does, we'll go ahead and get familiar a little bit with some of the other operations we can perform here specifically on SMI conformance. And so, one of those operations is to validate our service mesh configuration, our conformance. It looks like OSM is coming up here. It looks like we're up, up, up, up. And we'll initiate conformance testing. There are, well, I lose count of how many conformance tests that there are currently to find. The test assertions that are being used are not a complete spec. So they will address all four of SMI's APIs. But the conformance tests are a work in progress. And so, as such, we will show the current set. So right now that those are being run, I'm going to go ahead and take a look at some prior tests that were run just a bit ago. So these were run earlier on open service mesh. These tests go and do assertions across these different, the same specification we were looking at. Traffic access, traffic split, traffic spec. We don't have, looks like we don't have traffic metrics being tested here just yet. But there are a number of assertions, things that OSM should do and behave to be conformant. And I wouldn't reflect on OSM in a negative way here. The conformance tests are early. And so really is the notion that each of these meshes needs to adhere to a certain set of assertions. So meshry then collects these results and will eventually be publishing them in combination with the SMI project. Nice. S&P is our next specification, our next standard. S&P is service mesh performance. You know, it's very frequently been the case that as I engage and speak with people about service meshes, they are concerned about overhead of a mesh. And, you know, rightfully so. Service meshes can do a lot. And the more that you ask any particular piece of infrastructure to do, the more CPU and memory that you would expect that it consumes. As such, really trying to characterize the performance of your infrastructure of a service mesh is a really difficult thing to do. And to articulate it concisely, considering the number of variables that you have to track, how difficult it can be to have repeatable tests and benchmark your environment, track your history baseline, your environment, compare performance between other meshes. People need, you can go look up at published results from either some of the service meshes that do publish results about performance, some that don't. But what you'll find is that they're probably using an environment that isn't necessarily like yours. And they're also using different statistics and metrics to measure themselves. And it's kind of doesn't help. And so S&P was born in combination with, in engaging with a few of those different service mesh maintainers and creating a standard way of articulate capturing, articulating a performance of a mesh. In a future discussion, we'll talk about mesh mark. If you want to learn about what mesh mark is, visit the specifications website you'll see. It gets kind of maybe more interesting than you might think. It's first of all really rather challenging and somewhat intriguing how many things you need to measure and keep track of. It's not just the mesh, it's not just the mesh's version or its workloads or how many clusters you have, the size of the clusters, the type of nodes, all these things. It's all of that. It's also the type of infrastructure that I'm sorry, the type of the way in which you're configuring your control plan or your service mesh. You might be using client libraries to do some service mesh functionality. Maybe you're using those in combination with the service mesh or maybe not. What costs more? What's more efficient? What's more powerful? These are all open questions that S&P assists in answering and assists in answering in your environment. Maybe you're using WebAssembly and filters there. Well, what's more efficient? Should you implement retries as an example or authorization in one versus the next? Is one more powerful? What's the overhead of using one? There's a lot of approaches here to how you derive that value and how you put that value to work. There's a few examples here about path-based routing and context-based routing, just different load balancing algorithms. What are those going to cost you? The speed by which you can enable that in the mesh is pretty fantastic. You'd be surprised by some of the results of some tests that we've done and that the community has done in combination with a couple of universities and graduate students. We actually showed some of those results last KubeCon EU. We are going to do a short demo of S&P now. Okay. So I'm going to demonstrate service mesh performance or more specifically, I'm going to show you the implementation of service mesh performance. As Lee told you, that meshry, the service mesh management plane is the canonical implementation of the service mesh performance. So on my terminal, I have a local deployment of meshry running. You can also deploy meshry on Kubernetes as well as the vendor Kubernetes platforms like AKS, EKS, and GCP. Or you can use a doc-rise container to run meshry. I also have my Kubernetes on Docker desktop where I can just bring the configuration and you can see on my Kubernetes, I have open service mesh deployed. And yep, you can see all the powers are running, controller, Gryphonar, Prometheus instances. So let's jump into the demo and see how S&P spec works. So the meshry UI is exposed at 9081 port. If you will just go into performance description, this is the UI which is used to instantiate a load test. So we'll just performance benchmark or some control pin quickly, open service mesh. So here's the URL where I have exposed the open service mesh. I can just hit the performance endpoint for the same. Over here you can see we have three load generators, Fort IO, WRK2, and Nighthawk. All of these load generators have their own set of attributes which they record correctly and each of each of its attributes have their own significance. So let's just do Fort IO and start the test. While doing the test, let's see what other options we can expose in meshry. So in the advanced options, you can even tell what header should be passed while testing your services or service mesh. You can tell them the cookies, you can tell the content type, and you can tell the request body which should be passed while doing the performance test. And then yep, this should be completed. Okay. So here's the output of the Fort IO loader. And you can see the output is in Fort IO format and you can actually download the test results from here or you can just browse into the results tab and you can see all of the tests which you have run them now. So now we will be using Nighthawk to generate the load and benchmark the service for the same. And later onwards we'll be seeing how different load generator generates the load and how service mesh performance interprets the load generated using different performance, using different load generators. So some control. Here's the endpoint URL which I used earlier with Fort IO. Let's start this. So Nighthawk is the load generator which is maintained by Envoy community. And it is relatively new and it still hasn't got its one point already. But right now Nighthawk has sufficient features to compete with different load generators which are still in the play. Like it can generate a GRPC server, it can generate a GRPC service on its own and it has some more attributes which you can expose using their CLI tool. So here's the load test and here's the result which was generated using Nighthawk. You can also see that Meshry has the capability to search your environment and see what specifications are you using and what's the load on your Kubernetes. So let's jump into the results tab and see how we compare the test results. So here's the one which I run using Nighthawk and here's the one which you also run using Fort IO. If I select the Nighthawk one, if I click on the download, you will see that a YAML gets downloaded in the SMP format which you can browse over here. In the YAML you see that start time load time, the performance latency, the metrics, the environment are being captured. We plan to capture more details according to the specification we have specified in the service mesh performance repo. You can just go into the repo and see what all attributes we are planning to capture. One more interesting thing which we have about Meshry over here is if you see the following result and what I can do is I can just select the results and I can just compare them. So over here if you see the graph A, the blue line, this is the loader which was using the four dialog generator and if you see B, the orange one, this was the loader which was then using Nighthawk. You also have a WRK2 performance test which you can see over here. Very good. Thank you, Krish. It always helps to see a specification in action. Sometimes the abstraction is a bit too abstract for me. It becomes pretty obvious how SMP facilitates confidence and efficiency and how it is that people operate on Mesh. Our next specification to discuss is Hamlet. We mentioned Hamlet earlier. We talked briefly about Hamlet's focus on service mesh federation. Hamlet is an important specification I consider. Like I said before, in a lot of respects I consider that you might think of it as service catalog federation. Like the other specifications, Hamlet defines API interfaces, two APIs in Hamlet at the moment. One is the Federated Resource Discovery API. The other one is the Federated Service Discovery API. There is much discovery that's happening in these APIs that's intentional. The intention here is to marry up and connect to disparate service mesh deployments. Those deployments might be of the same type. They might be of two different types. Hamlet takes on a client server architecture in which resources and services of one service mesh are discovered, registered, and using a common format. Information about them is exchanged between different service meshes. Part of the real power here, or I consider in Hamlet, is the ability to overcome what are likely to be separate administrative domains. There is likely one team that controls this service mesh over here and the services that is running. Another team that controls this one. The ability to identify rules around authentication and authorization, rules around which services get exposed and to whom and who can communicate with them and whether or not they can do it securely. These are things that Hamlet addresses. In addition to SMI, SMP, and Hamlet, there have been an emergence of service mesh patterns. Patterns by which the way in which people are running and operating meshes. Service meshes have been around for four years now. Part of the work that we are doing inside of the CNCF SIG network, there is a service mesh working group. That working group is helping identify those patterns of which there is a list right now. Unbeknownst to you, this is a clickable hyperlink. When you have access to the slides, you can click on if you don't find the slides, reach out to Kush, reach out to myself, or better yet, reach out and join in to the service mesh working group. All are welcome. You don't have to be a member of the CNCF to participate, but come join in and help us work through the 60 patterns that are defined right now. 30 of those are going into a book called service mesh patterns. With that, it's been wonderful talking service mesh with you, talking abstractions and specifications. Thank you for your time. Kush and I will be in the chat, ready to field your questions if we are out of time and we don't get to them. Come join the layer 5 community. Come join the Slack. There are each of the service meshes that we have discussed today. There are representatives of those in the community and there are many people who have adopted and are operating service meshes in that community as well. A lot to be shared and to share. Okay. Thanks a lot. We will see you in the chat.