 Okay, good afternoon everyone. Great to meet you. My name is Adam. I have with me Liam we're from a touch rate And we're super excited to be able to talk on this topic of how we can turn our cloud native applications inside out using a service mesh So as by way of introduction, I'm part of our solution engineering organization at touch rate really a long-time proponent and contributor to open source the reactive in the cloud foundry spring and Kubernetes ecosystems and I have like I said Liam with me. I'll let him introduce himself Hi. Yes, I'm Liam. I'm a software engineer at touch rate. I Lead the cloud team So I'm also an SDO maintainer. So I'm a rare combination of SDO maintainer and user So that provides some interest against that great So what we're going to cover today is first a little bit of a History or what we kind of view as the common patterns and building blocks that are used for building cloud native applications specifically Using the spring ecosystem and spring cloud and some of the IP that Netflix contributed to the open source Then we're going to cover how does service mesh fit in with this type of architecture and this type of approach with Java applications Then lastly, we'll walk through a little bit of a migration example of how we go from potentially using some of these Netflix libraries to Introducing service mesh into the app now We're going to go through a couple code examples architecture examples pretty quickly But you'll notice there is a link to a github repo that actually has the before and after code that y'all can Take a look at and maybe even try and run and use as as an example So as we start out, I definitely want to preface this full presentation with tetrate myself We all heart spring and spring cloud So the takeaway from this should not be abandoned spring cloud or abandoned spring and go a different route You service mesh as a completely new alternative The point is to show you how this can augment and support Your spring and your spring group applications potentially unlocking some new capabilities that maybe You don't have today and maybe making a few things a little bit easier for you your developers and your operators of your cloud platform So how do we get to this point? We're looking at the end of the timeline where we are in 2020 or now? 2021 in What I see as most organizations looking at how do I build? microservice or cloud native applications and then Spread them across clusters and across multiple clouds Well, we were right we've arrived at this point because there's been a number of very important and interesting Contributions to the open source and technologies that make building cloud native applications and microservices a little bit easier It goes all the way back to 2012 when Netflix realized they need to build smaller services iterate over them quickly Optimize for velocity and then begin to be very vocal about the patterns of technologies and the ways they did this and even open Source many of these libraries Supporting this there is the emergence of Docker and containers Container scheduling platforms like Kubernetes and cloud foundry then we really saw The standard or the buzzword of microservices being the way to go for building applications Now around this 2014 timeline with spring community made it very very easy to start to build these types of applications with spring boot and then later spring cloud which Embodies and releases some of these Netflix contributions in the way spring developers have known to grow in love with convention over configuration In opinionated out of the box configuration that gets you started really quickly and then a few years later Istio was open-sourced and released to 1.0 and we'll talk about why that's important when it comes to cloud native applications Now the ingredients or why do we actually do this is the fundamental thing when we move to microservices or cloud native or just smaller units of work is we end up in the situation where everything we depend on or anyone that depends on us begins to communicate with us over the network and This is especially problematic if either the network or we could save more broadly the compute the cloud the infrastructure that We're running on becomes more dynamic More maybe somewhat less resilient in the terms of one individual unit of commute compute And so we don't want to lose any resilience if you take the example of an application Let's say depends on 30 different microservices to build a complete experience for end user if each of those Services has an uptime of four nines And then we multiply that though across 30 services the overall experience is only slightly above Two nines of availability mess if every single service meets its objective or SLA So that's why I love this description of one of my friends and former co-workers Duncan when who we describes cloud native as this way we build software that's designed to run and scale reliably and predictably on on top of unreliable cloud-based infrastructure or we could also say Dynamic cloud-based infrastructure So that leads us to begin to build into our application and since we're talking about spring applications and Java applications Built right into our JVM Patterns around how do I discover the services I need? How do I load balance across them? How do I do that in a resilient way and how do I fail fast when there's problems that are taking place? Then how do I get visibility into what is taking place with metrics and telemetry and then trace calls to service to service to service and that's really what is needed to begin to Build cloud native applications that are going to be spread across multiple clusters multiple clouds multiple committees and environments And so how would we actually do this in practice? Well, here's a little Super simple snippet of code if I were building an application application like this I go to start spring IO Which really helps me get started building including the dependencies and either my maven or gradle build system To define my application and then with just a couple annotations I can indicate that this application, which is modeling a to-dos application is going to have some circuit breaking capabilities within it unable to wire together a rest template which is making calls over the network and a load balance Across endpoints that are going to be under the covers are resolved using some sort of service discovery mechanism And as you see from my URL to my dependent service, which is called to-dos Redis It's able to resolve that to the endpoints transparently to my application and then lastly I'm able to give it a couple annotations so that when things Maybe get slow. I can fail fast or when things back up. I cannot have a cascading failure with my application So if we lay that across how we actually deploy it across maybe a Kubernetes cluster, I'll point out a couple Common Ingredients that we might have in an architecture here first We probably get traffic into our applications using a standard Kubernetes ingress controller And that typically will land at a spring cloud gateway, which then knows how to talk with Eureka Our service registry find out where our web UI is or API services Also, our API service is going to talk with that registry to both register where it's running and then how it finds its dependencies And then lastly since almost every application I see in the wild today is Running across multiple clusters this Eureka instance is probably going to be appearing with other Eurekas in other Clusters to provide service discovery for either replicas of that same application or its dependent services So this is a pretty standard architecture But there's a couple of gaps or challenges that arise with this first This is really optimized for the JVM or for Java applications because these libraries to solve these problems are built right into the application So the polyglot experience is really less than ideal if you're writing or want to build a service and something other than Java Secondly, since it is in the application, you have a very tight coupling with solving these Cloud patterns or these cloud challenges. I'd even call them network or platform Challenges that's coupled right with your code right with the artifact that you're going to deploy whether it's the jar or you could say more broadly The container you're shipping and then lastly usually some of these semantics of how to solve these problems end up creeping into your CD Processes and this is going to add overall complexity because you have platform and infrastructure level Concerns being solved right next to your business logic And then these these patterns don't have the most robust semantics and primitives for a multi-cluster definition Multi-cloud definitions it becomes kind of hard to say if you're running right next to me Here's how you can discover me But if you're not running in my cluster, you should discover me a different way it gets complex really quickly Using these patterns and then lastly, there's a certain set of applications that this just won't be applicable for you can't rewrite your application or at At least modified enough to include these libraries It won't be able to participate and then commercial off-the-shelf software certainly won't be able to run in this manner And so that's where the importance of a transparent layer that solves some of these concerns comes in a Transparent network layer, which now my colleague colleague Liam is going to talk about how service mesh can bring that to the table All right So transparent network layer. So if you go and look at the Envoy docs You'll see one of my core tenants of the project is that the network should be transparent applications and when network and application problems do occur It should be easy to determine the source of the problem. There are two parts to this obviously there is the transparency So what this means is the application shouldn't be aware or even care that it's part of a service mesh Right and by extension this includes developers if I'm a developer my job to produce business value I shouldn't have to care about mutual TLS. It should just be done for me I might care about retries or timeouts, right? But beyond that because that's maybe application specific beyond that. I don't care I just want to solve business problems and the second part is by visibility and because Envoy is TCP and UDP proxy with comps with an understanding of how to deal with htp2 and htp3 for quick It knows everything about network communications about the network communication between your services And it can use this information to make visible to everyone what's happening within your across your within and across your process So how does it solve this problem? We can see that on the left. We've got the Netflix OSS model where you know We bundle in discovery load balancing traffic management resiliency metrics tracing into the Service itself as a library when we are using Kubernetes and Envoy we extract that infant we strike that functionality out into Envoy itself and Envoy runs as a separate process alongside Alongside your service in a different container in Kubernetes as well It's in a in the same problem in a different container, but the key is that it's a separate process So what does the request flow look like? Well, a request comes into a pod It gets intercepted by Envoy it gets forwarded on to the service the service when it makes Outward requests goes in the reverse direction So like I mentioned in the previous section, it's separate container same pod they share a network name space in Kubernetes So we can do all kind of messing around with IP tables and that's how we do it transparently in this year Excellent So Envoy refers to this as an out of process architecture and what this really means is moving the logic from the libraries within the codes Out to a separate binary that can run as a separate process separate container Technically those two things are the same thing in Kubernetes and a docker but within the same pod and Doing this has quite a few benefits. So Moving this out means that it works with any language So it works like I mentioned at the network level speaks TCP or UDP You might have some crazy post call But you can extend Envoy to actually speak that post call if you can write something that compiles the answer was a If you're a purely Java shop, then this might not be much of an advantage However, the next one is which is it allows it to work with any legacy and third party applications So you have Postgres you have Redis you have Some legacy system running on a mainframe that you're never That you don't want to touch We just deploy either Envoy alongside it or if you're deploying on a mainframe Maybe you deploy it as an egress gateway on the way out to the mainframe, right? And then we see then we have that visibility and control of the traffic in the same way that you would have modern cloud native application Next advantage is that Upgrades of libraries are painful, right in a containerized world You have to recompile your images or rebuild all your images probably not compile them And then you have to make sure each of these deployed Sometimes this requires coordination with dev team. Sometimes it doesn't but if we decouple Decouple this functionality so that the platform team can just roll out security fixes and roll out new versions of Envoy For instance application devs need to be number-weiser, right? And again, they can just focus on actually Writing the core business logic. The other advantage of this Separation is if we are if we're If we're using an Envoy based service mesh nearly pretty much all of them are API driven much like Kubernetes is so in a similar way to if you Needed to scale out capacity. You needed to increase a number of replicas for a specific Deployment Because you're pinning your CPU you can just scale it up via an API change. You don't have to kind of like Change anything else. You just change that one thing and We can do that using number based service mesh as well, right? Like maybe Your something slow and it's taking 11 seconds to respond and you have a 10 second retry, right? You can just make a single config change To an API and that also that can automatically increase your time out So that you're no longer having an outage. You're just extremely slow and extremely slow is that usually better usually usually better than no request now There is obviously one disadvantage which is relatively obvious from an out-of-process architecture and that's latency Right, so moving out of process means that When you're making a request you're now going through the network stack within the kernel Within the kernel and user space you're going in and out bit an extra one two Four times right And this like it legitimately does increase except for some circumstances It legitimately does increase latency that I have seen examples like of the Python's TLS implementation is basically Less efficient than using Envoy. So if you're doing TLS in Python And you move it to Envoy sometimes you can see a speed up, but generally you will see an increase in latency so This is an issue if your workloads have strict latency requirements Then the GRPC teams currently working on implementing the Envoy APIs So on latency specific pass you can choose to use GRPC. You won't have to do the kernel user space network stack Jumps you just kind of do the thing. I think as well Silly and maybe have some stuff that does a it basically prevents you from needing to go through From user space into the kernel and back out again when you're doing this network hop But you need to be to them more about that. I don't fully I haven't been paying too much attention what they've been doing But I believe they offer some stuff specifically for Envoy. Next slide. So Moving this out of process gives us consistency. It's kind of breaks down into four things first one being traffic management So doesn't matter if it's the third-party application Postgres Redis legacy application Homegrown non Java application, right? You get this you use the same API to configure all of the traffic management It doesn't matter kind of It doesn't matter about any of those things so this functionality includes like retries circuit breaking request shadowing session stickiness Locality load balancing so keeping keeping your AWS or other cloud provider network cost time by keeping your traffic all within the same AZ where possible Canarying a V test and fault injection whatever whatever Functionality the Envoy you need to leverage. It's the same API for configuring all of those things It's also the same API for configuring Security and policy, right? So third-party, whatever doesn't matter same API If you're using Istio, we leverage Envoy's SDS Secret discovery service and we allow you to incrementally adopt MLS everywhere We handle certificate rotation for you we assure identities based on hours or days Not weeks or months and we do this because Certificate revocation is extremely painful So we just have short-lived certificates And yet it's a we have this consistent way of writing policy for all of all of your applications The next one is behavior so The if you have a bunch of implementations of different client libraries So you might have a Java library that does retries a go library that does retries maybe I don't know postcraft postgres has its own thing But maybe you have a legacy application doesn't really do timeouts or something called retries, right? Whatever The behavior is going to be different across all of the Implementations or it might not be there at all, right? And because we have the same binary everywhere save the GRPC stuff We mentioned earlier We get consistent but more importantly predictable behavior and predictable behaviors always better Not only that we can move, you know, like Adam mentioned we can put the We can put on boy at the Ingress Gateway. So We have that consistency there and the same goes for telemetry, right because on boy produces the same metrics Across the same protocol regardless of, you know, third-party Your own application, right? We're gonna have the same metrics. They're gonna have the same metrics names We aren't gonna have to coordinate metric names across teams. They just get given the metrics names so we can just generate dashboards like by Usually via code by a templating right like because the names are always the same and the Attributes are always the same We just get kind of we can just auto-generate all of that stuff okay, so let's Let's now take a look at how we could apply some of these things that Liam just mentioned in our same application And again, we're gonna go through a couple examples pretty quickly on how we can enable mesh capabilities around ingress service discovery client-side load balancing some resiliency capabilities and security grab the actual code examples from our Gearhead repo so you can see them working for yourself now with this updated architecture There's some obvious changes that we've made the client libraries are no longer in the spring boot application We now have an envoy instance that's paired up with every single Application container that's running in Kubernetes Secondly our our Kubernetes ingress has changed we're able to actually use envoy itself as an ingress gateway and have some of these capabilities that we're talking about at the furthest edge ingress of our cluster and then lastly the thing you'll notice is our service registry actually has disappeared So inherently within whatever control plane we're using to program Istio. It's able to Give information about where services are running endpoints and and provide that service recovery capabilities inherently into the platform So let's talk about this first item serve ingress and service discovery so in our spring boot application There's a couple changes that we simply can remove and drop stuff out of our application Optionally, we could remove the gateway. Well, that's not a hard requirement There might be certain use cases where you leave your spring cloud gateway in the mech in the mix But optionally that could be removed Secondly, we'll strip out our Eureka dependencies and then any code that's annotated with load balancing Discovery client in area or any of those related constructs can be removed And then lastly as I mentioned in the previous slide there are Eureka registry can be be completely retired Now I'm going to show you a couple of code snippets that are a little bit more how you would program this Imagining that Istio is your control plane, but ultimately this gets materialized into envoy configuration to actually program the data path And so any any control plane that's programming envoy typically would behave about the same even though the semantics might be slightly different So if we're utilizing Envoy as our ingress gateway, we simply need to tell Our our ingress and then program our envoy instances where it can route traffic to so it's a very simple definition of my traffic's coming into my Redis prefix at my endpoint send that to my redis status destination in my cluster My code I mentioned in the last there's a bunch of stuff that's extracted out of your Java code Nothing needs to be added So I can still refer to my service my to-dos dash redis service and for 8080 And if you actually look at what's manifested itself in envoy, we can ask envoy What routes do you know how to send traffic to what endpoints exist for those routes? And it would report back to us why no to-dos redis exists and here's the actual Internal Kubernetes IPs that I can send traffic to or if it's going across plus across cluster and might be an externally routable IP So going part and parcel with that service discovery and initial load balancing or initial ingress is actual more full-scale client-side load balancing So if we haven't already done so we can strip out of our Java code out of our application Those load balance or discovery client annotations Potentially if we've done something more sophisticated like provide a custom implementation For the ribbon load balancers or a more sophisticated algorithm or configuration for that That also can drop out off and now we can program our envoys of how we want to shape traffic So I have three very simple examples here of how we can split traffic across Multiple instances in this case is sending 95 percent of our traffic to our original version of our cash service 5% will go to our new version Using an application labels under the covers and then we're giving it a little bit of information when things might misbehave at half a second time out Retry three times and what are the type of scenarios we should retry retry on? We also can tell envoy What is going to be the actual load balancing strategy the traffic policy that's going to be applied when we establish these connections? And then lastly, you know one really interesting thing we can do since we're controlling all traffic We can actually Control how we connect to things outside of our service mesh or even outside of our Karate's clusters and have that do things like TLS mutual authentication I'm going to when I get it to a little bit of a security Slide, I'll talk why that's pretty significant important for our applications Now we also want to make sure our applications are resilient things are ejected from our client side load balancers as as needed And so in our application we can drop out any his tricks or spring cloud circuit breaker dependencies that are in our Greater build or maven palm files Remove all the annotations that are related to that and then if we have implemented any circuit breaker factories Or potentially we're using the more modern resilience for J We can remove the config the properties the annotations that we've placed on our methods regarding that But then we can tell any Redis or excuse me any envoy sidecar how it should behave when it connects so in this example We're fictitiously saying my API service is connecting to Four pods of my cash service one of those misbehaving well, we're able to configure it so that It can only do ten connections at once It's going to time out after five seconds and what are the ways or how should we eject? Endpoints when they don't behave correctly so in this configuration We're saying in any ten second window if we have ten five OX errors We should eject that endpoint for a minute and obviously there's more robust and verbose Configurations that we can made for make for this but very easy to transparently interject this into our traffic flowing through the system Now as a little bit of extra credit, I'll talk about some security items because that's one Inherent capability that this transparent networking layer the service mesh gives to us So within our application we can strip out any of the complex trust store key store stuff that we may have had to do in our JVM We can connect to services just simply over TLS and rely on the service mesh to and and our Envoy sidecars to handle all the security if we're inbound and outbound So in this example, let's say we only want our cash in service to be talked to by the API service Things closer to the edge like our web UI or our Ingress shouldn't directly connect to that So we're very easily able to author policy that says for all traffic and MTLS is going to be strictly enforced And we're also going to inspect that certificate that's presented by the client And we're going to only allow principles identified by the to do is API certificate to connect And so that's the certificate an X519 certificate that's issued to every single Envoy Sidecar to our application that's going to have a unique identity So we can author both off N and off Z policy that looks like this So this is controlling service to service Communication and the security aspect of it. We can also control the request level of security So this varies very very widely of how we would implement this in our application using spring security So I'm not going to cover what we might be able to remove from our application That could probably be a 30 minute talk on itself But within the actual service mesh We can author policy that is going to validate tokens And so in this case, it's going to be talking to key cloak inspecting the jot Tokens that are going to be presented by the request both validate that that token is valid and was issued from our Identity provider and then in this example, we're looking at all the claims that are Associated with it to make sure that the user invoking our service has the to do user role So we're able to not only identify What is the service to service communication and control that traffic at a very frying grain level? but also the end user who is Invoking our service and propagate that throughout our service mesh and enforce that using Envoy as a policy enforcement point So to wrap this up or to kind of summarize what is this new architecture that we have with service mesh and spring spring can still bring to the table The very easy way to get started and build microservice applications with spring boot and all the libraries that make consuming data Services messaging services a reactive patterns building you eyes all the things that spring does a great can be built into Your application we can decouple the common cloud native patterns out of our application to simplify the architecture We can have that platform provided for us And so that's going to unlock and simplify any time we want to take a polyglot approach and introduce non-java Services into our application architecture then it also will greatly simplify once we begin to span multiple Kubernetes clusters multiple cloud environments or multiple cloud providers themselves and the semantics for Expressing that to our applications are much much easier And then this also go unlocks the opportunity to begin to introduce non cloud native applications into our cloud native architecture And then lastly as we saw in kind of the extra credit portion We can very easily consistently and transparently enable all of our applications by default with some pretty sophisticated and advanced security primitives to move us towards Initiatives like zero trust architecture within our applications. So service mesh Envoy that's really like a peanut butter and chocolate with a spring you put them together You're going to get a really great recipe. So thanks for your attention with that now I think we have just a few minutes for questions to answer