 Hi, my name is Rosemarie Wang and today at .NETConf, I wanted to talk about stretching the service mesh. So this is a really interesting story for me because it was inspired by a situation where we were working with .NET applications and we wanted to use the service mesh. But the problem was we really weren't sure whether or not it was good or bad. So we remained pretty neutral on that. I think this is an investigation on what it really meant for us or what it could have possibly been and what kind of struggles we might encounter. So right now I'm no longer an engineer who works in as much in .NET as I used to. I'm a developer advocate at HashiCorp. HashiCorp built a set of open source tools, including Terraform, Vault, and Console. Today we're going to take a look at Console specifically as a service mesh. But most of what I do is any questions or features that want to be looked at within the open source community, I'm there to facilitate that. So if you have any questions after this webinar and you are looking for more information about consoleservicemesh.net, you want examples, et cetera, I'm happy to answer those questions. So I wanted to start with what the buzz is about service mesh to begin with. The big thing about service mesh is that it seems to be a big buzzword everywhere and when I started to think about it and look into it, it seems like it was, well, just another buzzword that someone was throwing at me. But I wanted to examine two main issues that a service mesh might solve, which is tracing, how do we gain visibility across transactions, and traffic management, which is how do we shape traffic and how do we control it. At the end, we'll talk about what we can conclude from some of these exercises. So there'll be a little bit of demo, a little bit of thinking, this is not going to be a deep dive into service meshes and all of the capabilities, each of these actually could probably do with their own 25-minute talk. But this is just a quick overview and understanding of what it might look like. So what's the buzz about service mesh anyway? Well, when I first heard about it, it was in the context of we need a service mesh, but we were really actually looking to solve three problems. We can't trace transactions at the time. There was a private data center and there was a public cloud, and we wanted to trace transactions across both. The next is we don't really have a way to control the traffic. When we wanted to do Canary, for example, we had to configure 50 things on the data center side and then 50 things on the public cloud side to test Canary. The last bit is we can't secure every application. I'm not going to go through this one today because this one can be done in many different ways, but there was the idea that we wanted to encrypt all East-West traffic. So everything had to be TLS encrypted, and it's a great feat to accomplish that, that you could potentially do with the service mesh or you could control network policy as well with the service mesh. But we ended up not really thinking about that too much. So I'm going to look at the first two for a second. So the other piece of this was also, we needed a service mesh because Kubernetes has one. We were running partly in Kubernetes, but we are also running in a private data center. It happens a lot too. But the service mesh in Kubernetes, specifically SEO, doesn't really stretch. It doesn't go to the private data center. You can configure some things to do that, but at the end of the day, you might end up configuring some extra stuff to accomplish that. But with this premise in mind, we wanted to say we're going from model with to microservices, we're going from private to public cloud, or we're running in both. And some things could run in public cloud, some things could run only in private data center. So what do you do? How do you gain visibility across all these things? How do you even traffic control across all of these as well? Well, it's a great question. And today we're going to use this example. They're actually not really complicated applications. It's just an expense report application where it generates the report, which outputs a total. And then there's the expense, which tracks the expenses. There are two versions, you know, maybe one is for tracking a different, there's a currency implementation in the other, or we're strangling it out. Or originally the expense app was a monolith in the private data center. And it's all backed by a Microsoft SQL server database. So after all of this idea of ideation is saying, okay, we needed a service mesh because we want the visibility, we want the traffic control, it kind of just came down to, well, we need a proxy to control it, or do we just need a library? And in the Java space, it's pretty, there's a pretty large set of tools that allow circuit breaking, there's a lot of traffic management tooling, there's a lot of tracing tooling. So can we accomplish that in .NET as well? It was something that was a big question for a number of engineers. Or maybe we just don't even need to implement any of the code. We can use a service mesh, which is that abstraction and the abstraction of function. And the service mesh can accomplish this by putting a proxy in place. So every service is fronted by a proxy that runs right next to it and all of the traffic gets funneled through it. So can we do it with a proxy, a service mesh, or do we just use a library? Can it be simpler? So I kind of decided, well, like let's actually take a look at this. In order for us to stretch it, right? In order for us to gain visibility across public and private data center, what do we need you to make tracing a reality? As a little disclaimer, tracing is non-trivial. You could probably spend it whole hour and talk about tracing. I don't have a whole hour. So I'm just gonna talk about what I can do with tracing for now. I just did a quick overview. So what I wanted to do was get started with tracing as quickly as possible. I have an ASP net core application.net core application. I didn't really have the network right now to actually implement the database in a public cloud. So that's TBD, but the library for tracing is actually using the open tracing specification. If you haven't kept up in the open, in the tracing space, don't worry about it. But briefly, open tracing is the interface, right? And so the implementation of the interface can vary. It could be Yeager, it could be Zipkin, et cetera. In this case, it was really easy to use Yeager. So all I really did was configure this snippet that says add open tracing, add a tracer singleton. And that was pretty much it. There was some configuration of the tracer itself. The tracer says, okay, I need to point to a sampler, which is Yeager. And I also need to point to the Yeager connection. But overall, pretty straightforward, not terribly difficult, which was actually really neat to see. From a code standpoint, what I'm gonna do is I'm gonna actually spin this up. So you'll actually see that in the code itself here, I have to actually build each span. I have to tell it, okay, this is a span that gets the expense by IDs. It's not necessarily foregoing my need to insert the straight, this tracer code into my application, but it was really easy to inject the tracer into all of my classes. So it's not terribly bad. So what I'm gonna do is I'm just, it's not the worst case scenario in terms of configuration. So I'm actually going to create my stack here. I'm just using Docker. And the idea is that I didn't really want to print into some networking issues, which has happened today. So what I'm gonna do now is actually just spin it up locally. And it's just got a set of three applications. Well, it's got one database and two applications. So it's got my expense database. It's got an expense version one, and it's got the expense version two. So, or expense version two eventually, and it will have the report. So all of these things are coming up. If I examine my browser, what I can see is that console has come up. I'm using console as my service mesh. Console has an Envoy proxy. So that's why you'll see console configuration, but really what we're focusing on is passing the configuration and whether or not it's going to be passed to Envoy. So here, what we're seeing is all of my services are up, as well as their proxies. And when I took check Yager, which is going to be my tracing implementation, I'm just gonna examine whether or not it gets a call. So let me open up a new tab. And I'll just call expense, right? This will just give me a list of expenses. Right now I don't have any. That's fine. So, but when I actually look at Yager, the implementation is really, really useful. It kind of injects all of the spans for me, including the child spans, which is really tricky. So here you can see I'm listing expenses. You can see that it's getting results from the SQL. Unfortunately, this clock skew is messing up with the span, so it's supposed to be here. But anyway, the idea is that I can actually see this pretty easily and from a code standpoint, it's not as much code. It's really elegantly and cleanly done. Let's say I want to maybe look at report, right? Maybe I want to actually look at a specific trip, which in this case, I don't have any trips, but for now I'm just gonna get to it. Oops, sorry, wrong endpoint. My report is on 5002. So let's say like, can I actually see the span across report and expense, because report calls expense. And the answer is yes. It's a really, in some ways, straightforward implementation from a tracing library standpoint. So overall, really elegant. All right, so that's not so bad from a library's standpoint. It's pretty neat code. So let's try to do it in the service mesh, right? So at first I was like, all right, I'm just gonna take what I have and funnel it to console and avoid my service mesh. The idea is that I should see the proxies log of the span. So I should see, in lieu of the span that you see today, which is SQL client, et cetera, et cetera. Instead I should see the service name, like console, going to expense, going to expense DB console and then expense DB. So I should see that sequence. But unfortunately, the HANA way implementation in console only uses ZipKin spans, ZipKin format. And so ZipKin and .NET has been, it was a little tricky, right? I tried ZipKin, .NET, and then I tried open telemetry. Didn't quite work. Open telemetry, by the way, is kind of the mix of open census and open tracing. It's a way to consolidate the specification, basically. And then I tried ZipKin with open tracing and the implementation didn't quite work. So all across the board, it was just confusing. And it ended up being more difficult in some ways to configure it for a service mesh that would work across data center and private cloud, but then it did to just inject the library, open tracing library. Now, are all of the open tracing, is all that open tracing implementation available for all versions of .NET? Of course not. So you may find yourself saying, oh, I wish I had visibility into the database, something like that. In which case, you could end up doing ZipKin and adding that in, but it turns out you still kind of need libraries and code to mark spans. You can't just customize spans within a service mesh that easily either. So you end up using a couple of code tracers just to mark some spans for yourself. There's a lot of movement in the space though, so I expect it to kind of stabilize a bit on the approach. As I mentioned before, open census and open tracing is open telemetry. There's not really a stable library for .NET and open tracing interface. That's implementation of ZipKin on the backside. It's just not there yet. And then Envoy 1.12, which support Yeager native tracing. And so there's a potential that this might actually be easier and I might not have to change any of this code. So in the meantime, I intend to keep it as a library. All right, the examination of traffic management. So traffic management is a big piece of the puzzle that I think can be very difficult to configure across the board. Mostly because you can configure it from private data center, you'll have to configure 50 things. And then public data center, you'll have to public cloud, you'll configure 50 things as well. So circuit breaking. Circuit breaking is kind of complicated. But at the end of the day, it kind of boils down to when I get some number of 500s stop, immediately return error. This is something that's more prevalent in the microservices space, although it's really useful, especially in the database or the strangler pattern. Mostly because when you're trying to usually hit a private database that's not always completely stable, the circuit breaker is usually really useful to stop requests from, you know, cascading everywhere. So in this case, we'll have report and stop, you know, when it gets 500s three times from expense. So there are a couple of circuit breaking libraries in .NET that you can look at. Someone referred me to aca.net and there's Polly. The circuit breaking that I found has ended up kind of being write your own logic, write your own adventure circuit breaking, where you dump some function and then wrap it and then put it out of your own library. So people could do write code for this. And circuit breaking, however, in the service mesh space can be actually a little easier to configure. I embedded two snippets. Circuit breakers in Envoy is a very simple kind of circuit breaker where it reaches the threshold and then cuts it off. That's pretty much it. The more sophisticated approach to circuit breaking is actually called outlier detection in Envoy. So these are JSONs that are actually passed to the Envoy configuration. And I'll show you how it's funneled through console. But the outlier detection basically says if a number of hosts comes back with, you know, consecutive 500s, those get removed from the pool. So it's a more elegant way to determine, you know, hey, is there something wrong with this particular subset of hosts? So I'm gonna go show some of this configuration in console specifically. And what you'll notice in the report application is the report is my downstream, right? So it calls the expense, which is upstream. And in my downstream, what I'm really looking for is this cluster JSON. This cluster JSON gets passed to the Envoy cluster configuration and it actually has this outlier detection applied to the cluster level. So it's a little bit confusing, but the idea is that cluster represents an Envoy endpoint or an endpoint with an Envoy proxy kind of metadata. So the endpoint is pointing to expense, which is my expense application. On top of that, I also have up here the more simplistic version of the circuit breaker, which is just on report. And there's also one on expense as well. But the idea is that if there's a maximum of 10 requests to report, it just sort of cuts it off. You'll get some failures in this. So overall, most of this is pretty, seems pretty easy, but in terms of debug, again, it was actually pretty tricky when trying to get these configs to work does take a lot of time. So you might find yourself from an operational standpoint saying, hey, this implementation of this proxy and passing this through the service mesh, I mean, it's easy to get started, but if something needs to be debugged, you might find yourself spending some time debugging. So that's just one thing to note. So let's actually go to something that's even more traffic shaping related, right? Which is canary. Let's say I have my strangler pattern here and I want to test if version two of expense, which is my new shiny expensive application microservice is good to go. It actually works. So maybe I went to canary and say, okay, let me only test it with a certain group of people or I only want to test it with the header, right? So here, maybe what I'll do is create my second application. So I'm going to actually start up a second instance of my application and the second instance is just the expense version two and then set up. So that's really what it ends up kind of looking like. So I'm actually going to see here where the expense and the expense sidecar proxy are actually configured as version two. So now that I have the second version, what I'm actually going to do is look at the exact request. So as you can see here, I've got expense version two and expense sidecar version two. And now I'm just going to make a call to my endpoint. I need to be able to go through the proxy in order to do this. So what I'm going to do is I'm going to actually go through the report to call back to expense, right? The idea is that maybe you want to say, I'm going to examine whether or not this endpoint is valid. So if I do localhost 5001 and I add the header, for example, testgroup B, oops, API, you'll see version two. So as you can see here, what I'm doing and I'm basically saying, I'm looking for a version two. Now, if I actually do this from my report application, right now it'll only go to version one, right? But what I'm actually going to do next is show a little bit of traffic shaping. So let's say I did a little bit of Canary I did a little bit of AB testing. I'm going to actually look at this and say, okay, now maybe I'll actually do some splitting. So I'll actually start to do blue, green, right? Like 50% blue, 50% green maybe. So here I'll have a service splitter. So notice this is not to Envoy JSON. So before in the circuit breaking, we had JSON and that was Envoy JSON that's being passed from console to the Envoy proxy. This is not, this is wrapping around all of that. So console is taking this configuration and creating the Envoy JSON for you. It's a little bit easier. So in terms of the traffic split here, I'm going to do 50% version one, 50% version two. Simple, right? And so what I should see when I actually examine this is half the time it will go to two and half the time it will go to version one. So here are the splits that I would do to arrange my traffic. I also would need to ensure that I have subsets configured. This is the service resolver. This is also getting passed to Envoy as well. And there's also a router. The router is what I showed, which is the canary. So here I can actually say there's a certain header, test group AB. You could also use this for AB testing as well if you want. You can use a service splitter for canary if you want, you know, like 2% of traffic divided there. So there are a lot of ways you can configure it. But the idea is that these abstractions are all going to be handled by console and then passed to Envoy in your service mesh. So in some ways it's not that bad to configure, right? If I'm thinking about like, did I want to put this in code? Maybe not. Maybe I want to put this in my CI pipeline and configure it there. Or maybe I want to be able to configure this separately. The idea is that I don't need to necessarily touch the body of my code to actually examine this. I can run this separately outside of it and I can test expense locally. The other really actually neat part about having the proxy there as well is that testing locally is really neat. A lot of the application settings, as you might realize, are a little bit tricky to configure. So here, you know, you're configuring just a local host. Well, that's weird. Why are you going to local host? Well, the proxy is actually sitting local host, right? So as long as the proxy is sitting local host on port 19,000, 20,000, whatever you desire, what it will do is actually if you say local host 1433, which is a Microsoft SQL server, it will actually automatically funnel this to the correct Microsoft SQL server instance through the proxy. So the proxy reads local host and will service map that for you. So local host gets mapped to, let's say, expense database. It's not like you have to put expense database in your app settings anymore. So it's really neat because it means that I could test this locally. I don't have to change really anything. And it also applies whenever I put it in a kind of my actual application hosting entity, if that's a virtual machine or a container, I just reference local host. And it does the port mapping and service discovery for me when I use a service map. So it's really neat there. All right, so we talked a little bit about Canary. Here's kind of a snippet of what it might look like. Test group, for example, for AVs, AV testing or just whatever amount of traffic you want to divert. And we briefly looked at traffic splitting, which is the splits. So all of these are pretty easily configured and you can just pass that through. All right, so after all this, what are we really concluding? Well, tracing is still non-trivial. I still find it really difficult. Libraries and mesh compatibility require some investigation. So it's kind of tricky. I had some really true trip-ups trying to determine what's compatible where. And maybe the greatest value of having a service mesh is putting the proxy in front of the database. If you put a proxy in front of the database, you're able to track a request to the database within your span. And that might be your greatest value. But with the library being pretty straightforward to use and implement, and it covers a number of .NET applications, you might find yourself just using the library anyway. Traffic shaping. It's worth attempting. And a typical kind of like public cloud blue-green on an infrastructure side, you would have to configure the load balancer. And in this case, maybe you can just use layer seven stuff to just say 50% goes here, 50% goes there. It's pretty simple and the service splitters just a configuration that you apply. There's less configuration within the application code itself, so I don't have to include the polylibrary. I don't have to include all the other libraries. And I can control some of the traffic based on intent. So in terms of kind of realizing this reality, traffic shaping might be worth trying if it's something that you really need and you might find useful within the application space. If you have a very extensive stretch kind of a workload that goes from private data center to public and you do a lot of it, I mean, traffic shaping might be worth trying. So as usual, I have some references at the end. There's some docs from Envoy Proxy. Shout out to Nick Jackson, who's my teammate. He does a lot of console related work and he has some great examples on all of these features included here. They're unfortunately not.net, so if you're interested in some of these.net specific examples, I have a repository here for you to examine. Thank you all.