 So hi everybody, thanks for coming to our talk. I'm Angela, and this is Gabe. And today we'll be talking about sidecarring all the things in Cloud Foundry, so Envoy Istio and CF. So just a little bit of background. The two of us a few months back started researching what it would look like if we put sidecars into Cloud Foundry. So this talk is really started with this nebulous initial research, but a lot has happened since then. And so we'll be delving in to that as well. So brief outline, first we're going to be talking about what exactly do your microservices need and what Cloud Foundry provides of those features versus what it's missing today. We'll then look into sidecars and how they help before delving into specific technologies of Envoy Istio. And then lastly cap it off with a work in progress and what's next. So you're building some microservices. Good for you. You've broken up this giant monolith and you now have all these separate processes running small tasks. But now you have new operational complexity of how do all of these microservices actually talk to one another? How are you setting that up? So you probably want your microservices to be set up so that they are able to retry, that they have the ability to load balance, that they're engaging in mutual TLS, that you have security. You want to make sure that you're providing all of these features, that there are configurable timeouts, that you're collecting metrics. You want to make sure that your microservices are behaving correctly and engaging with one another. And let's look at a couple of these in depth. So let's look at the case of retries with microservices. So you split up your front-end and your back-end and you have a request coming through. You want to make sure that if you're asking for data for one back-end instance and it doesn't have the data, that you're retrying on another instance. You don't want to just drop the request if there's something else that can serve the response. Another case would be load balancing. So in this case, you want to make sure if you have multiple instances of your back-end that your front-end isn't just sending all traffic to one instance of your back-end. The whole point of having multiple instances is you want to be able to scale it. And you want to make sure that you're load balancing across all of them. Because if you're sending all of your data to one instance, you could be causing that instance to catch on fire because you're putting too much load. And then your front-end catches on fire too. But if you load balance between all of these instances, then not only should your back-end not catch on fire because you're sending less traffic to each of these instances, but you're also just doing best practices with load balancing. Transfer security, you want to make sure that all of your microservices are communicating securely. This is most often done by using certs from both the front-end and back-end to engage in mutual TLS. These are all best practices for your microservice. And the question is, how do I do all these best practices? All right. So Cloud Foundry today provides some stuff, but not everything. So we'll talk about those. One way to get some of these features today is to use the Cloud Foundry router. But before we talk about that, we're going to do a little orientation. You might have a data center or a private cloud, and it's connected to the public internet somehow. You have a load balancer. You have your Cloud Foundry router. Your apps and services are on the inside. Networking folks will use the jargon of north-south to talk about the direction of flow coming in from the public internet down into your data center, because the diagrams are generally drawn this way. And then one way for applications to talk to one another is to go back out all the way to the top and come back into the front door. This is called hairpinning. And if you wanted apps to talk to one another, they could do this north-south approach to that. But the better thing would be to have the applications talk to one another directly. And this is called east-west traffic. And we'll be using these terms a little bit going forward. So in the north-south approach, using the Cloud Foundry router, you have traffic coming in from the load balancer. And the router actually provides some of these features that microservices should be using. Not all of them, but retries and load balancing. It's got down, metrics collection. The other one's not so much right now. But if you didn't want to do this hairpinning thing and you didn't want app A to have to go all the way out through the front door, come into app B, and by the way, when you do this, you have to expose app B through your load balancer. And this is not so great because then anything can reach app B, not just app A. So if you wanted to avoid this, you'd maybe take advantage of the new container networking features that our team introduced recently. And it allows app A to reach app B directly. And that's great, but that's just an IP network. There's no load balancing. There's no client side retries. There's none of that stuff. And so this makes people kind of sad because they don't have any of these features. And that's really the motivation for introducing sidecars and the sort of rest of this talk. Is this me? OK. So one way to approach this is to say, well, we can introduce a library. We'll build Eureka or Spring Cloud Services or one of these things into the application. And it can provide these features and the developer still doesn't have to think about it. And this is cool. This is the in-process architecture. Your requests between your front and back can get mediated through this thing that you shouldn't really have to think about. It makes your developers' lives easier. But remember, it's not just a monolith. You have all these little microservices. And you put this library in all the microservices. But actually, your microservices are probably not all written in the same language. You have some mixture of languages, and therefore you have a mixture of libraries that you need. And the libraries are subtly different. They have different feature sets, different configuration, different quirks. They're going to interact with each other differently, and that's going to be frustrating. And using multiple languages should not be painful, but it is if you have to use this library approach. And that really gets us to Psycars and how they can help. Yeah. So let's look at Psycars as a better approach to this in-process architecture. So we can think of this as an out-of-process architecture. So now we have our application. And running alongside our application is a separate process. So your application can just focus on your business logic. And the separate process can be providing all these features that we want, like retries and load balancing. It's also important to note that this separate process can be used both by application sending traffic out, but also by the application when traffic comes in. And this purple box here is what we call a Psycar. So what's a Psycar? A Psycar is a separate process that runs alongside a microservice. It's a proxy that the app can reach via local host. And it's doing both layer 4 and layer 7 traffic. And the Psycar can proxy both ingress, so traffic coming into the container, as well as egress, which is traffic coming out of the container. So the Psycar, in its ideal form, is providing all of the features that we want. In addition, when every single component has a Psycar, we get a service mesh, which is an infrastructure between your services and your network layer. And the great thing about having a service mesh is you have this unified control plane that's talking to each of these individual Psycars. So you make sure that each of these services co-located with a Psycar is getting the same set of features. So you don't have the concerns about slightly different quirks and configurations that you might have with an in-process architecture. So let's look at how Psycars can solve some of the problems we talked about earlier. So let's look at client-side load balancing. In this case now, when a front end wants to talk to multiple instances of a back end, it will send us traffic through the Psycar. And the Psycar will know to do a round robin load balancing, connection one, connection two, connection three. You could get even fancier with Psycars. And let's say you're back end one, you want to be your main back end. And you could tell the Psycar, hey, I want to send 80% of my traffic to instance one and only 10% to each of the other two. And so you're getting more features by using the Psycar in terms of load balancing. So if we look back to all these features that we want our microservices to have and we see what the Go Router provided, what Container Networks currently provides, we can see that with Psycars, we're getting all of the features that are already provided, plus some that aren't provided currently. And this is for both North-South and East-West traffic. So you're finally getting unity in the experience for North-South and East-West in terms of this feature set. OK, to make this real, we can talk about something that's actually in flight right now. How they can help Cloud Foundry. So the routing team has faced this problem persistently called the route integrity problem. If the routing control plane goes down, how do we avoid misrouting requests to applications? And to explain this to you, I'm going to first tell you a little bit about how the routing control plane works. Every application instance is on a Diego cell. Every Diego cell has a route emitter. And the route emitters are pushing messages through this message bus. The messages say, if you're trying to reach this named route, then you can find an instance of a back end for it on this particular Diego cell with this port. Those messages get pumped through the message bus. They get collected by the router into this route table. And the route table is this in-memory representation of this data so that when a request comes in from a user, there's a look up to the route table. And then the requests can be load balanced and retried across the various back ends that correspond to the particular requested thing. And this is great. This works well. But sometimes that control plane becomes unavailable. And this can lead to problems. So suppose that message bus catches on fire. The router, really the only thing it can do in this case, is sort of lock down and say, well, I don't know about any new messages that are coming through. The best thing I can do is hold on to the representation that I last knew about and continue routing requests according to that state because I don't have any more updates. So it allows requests to continue to flow through. And that actually works OK until some back end gets rescheduled, until some application instance gets restarted and moved around. And now the place where app B used to be on cell 2 gets replaced by app C. And now the route is stale. And the router is forwarding traffic that was supposed to go to app B, but it's actually sending some of it to the wrong application back end. And that's a really big problem that makes users angry. They have asked for app B, and they got app C instead. And so this is where sidecars can help us because by introducing a sidecar into every application instance, then the router is able to do mutual TLS with that sidecar. The app developer doesn't need to know anything about this. They're not aware of it at all. But there's this mutual TLS connection happening. And so the router can, in particular, expect to see a particular certificate come back from the server. And the certificate coming back from the server is going to encode the container identity, the application identity of the application there. And then when the request maybe ends up at the wrong back end, the go router is still able to reject that back end. And this is how we can keep application users happy. The requests are still going to only the correct back ends, even when the routing control plane might be unavailable. Just a little summary, we added a sidecar to every back end container, doing TLS. And so things stay up, even when the system is partially unavailable. This work is actually already in flight, which is really exciting. The latest routing release does this. The latest Diego release does this. The router has to do mutual TLS to back ends, not really knowing what on the back end is serving TLS. And the Diego release is injecting a sidecar proxy into every application container. And so now we can talk about specifics. So you saw me use this little logo inside the proxies. This is Envoy. Envoy is a layer four, layer seven proxy. It's written in C++. It has a very low memory footprint, which makes it appropriate for putting one in every application instance. It's designed for the service mesh model. And it has extremely dynamic configuration. You can essentially completely reconfigure the Envoy without restarting it. We're not taking advantage of that in this route integrity approach that I was just talking about, but it makes it great for future use cases, which we'll get to later. About the project, it was built by Lyft. It's open source. It's a recent addition to the Cloud Native Computing Foundation. How does it work? It's a proxy. Request come in. They get transformed. Request go out. Internally, it has these representations of listeners, which are things that receive TCP connections, filter chains, which can transform connections, and then cluster definitions for where those connections get routed to when they're going out the system. And so with this sort of data plane in mind, then you ask, how does this all get configured? And Envoy expects there to be a configuration server that it's pulling or maybe getting push updates from. It uses GRPC in the latest version to pull down definitions of all of these things. And so then this sort of blank, slight Envoy can be reconfigured at runtime to have different sets of listeners, different sets of filter chains, and whatnot. There's this Envoy API that it expects to consume. And Envoy itself is totally agnostic as to who provides that API. But in order to use the Envoy in the sort of fanciest way, you'd expect there to be a pretty fancy server behind the scenes that's sort of configuring it with all that information. And that's the motivation for Istio. Yeah. So we need something to serve the dynamic configuration to each of our Envoy proxies. And that leads us to Istio. So what is Istio? Istio is an Envoy control plane. So it's providing all the information that the Envoy's expect via the Envoy API. It's running in Go. Currently today, it runs on Kubernetes. But the community wants it to be cross-platform. So we want Istio to be able to run on Cloud Foundry as well. A little bit about the project. It was built by Google with help from IBM and its open source. So how exactly can Istio help with our Sidecar use case? So here, we're going to treat Istio as a green box. And so when we're doing load balancing with mutual TLS, the Envoy will ask Istio for all of its backends. We'll get that information. It's able to then engage in mutual TLS with multiple instances of the backend. If a backend goes away, then Istio can update the Envoy to let it know the new set of backends. And then the frontend will then engage in mutual TLS with only the backend instances that are now around. So that's great. But how does Istio actually work? What is that green box actually composed of? So we see here that Istio is composed of three main components, Pilot, Mixer, and Auth. So Pilot is the component that's doing the dynamic Envoy configuration. So it's letting the Envoy know what its setup should be, where it should be routing to, all of that information. The Mixer is doing policy checks and telemetry. So you could envision, far down in the future, that Mixer could actually be telling app A that it can't talk to app B, or app B that it can't talk to app A. And Auth here is the management system for distributing TLS certificates among each of the Envoy instances. So let's focus in a little bit on Pilot, because that's the interesting part with dynamically configuring the Envoy. So Istio's Pilot architecture has it so that Pilot is composed of sort of three main parts, a platform adapter, an abstract model, and an Envoy API. So you have your cloud platform, whatever it is, serving information via a platform adapter. This platform adapter will change the information into an abstract model that Pilot expects. And Pilot will then take that abstract model and transform the information into the Envoy API so that it can serve the dynamic configurations to each of the Envoy instances. In the case of Cloud Foundry, we could see the routing tier serving as the back end and writing a specific CF adapter to take the information from the routing tier and transforming it into the abstract model that Pilot expects. All right, so as we mentioned, some of this work is already in progress, some of it's upcoming. Diego has already done the work to give every instance its own certificate with its identity encoded into it. They've got some experimental work to provide an Envoy to every container. Today, they're injecting it into the application instance, the same way they do the SSH daemon and the health checks. The routing team has already got experimental features to do mutual TLS to back ends. The container networking team is working on providing DNS between applications on the container network, which is going to be helpful for East-West features. And then going forward, Garden just incepted on a new thing called pods, which is multiple containers sharing a network namespace. And so you could imagine your application being one container and then having an Envoy next to it as a separate container with its own file system and its own memory namespace and stuff like that. And the nice thing about that is that it'll make it easier for us to swap out the proxy. Maybe if you don't want to use Envoy, you can use something else. If you want to upgrade your Envoy, you might be can do that. And it also allows Diego to take advantage of this feature and, say, inject other things alongside your application instance as well. If you want to have some logging system in there, you could put that as another P in your pod. So the container networking team is going to take advantage of this Envoy also to do client-side load balancing. And then the routing team is thinking about maybe Bosch packaging Envoy and using it as an edge router, maybe alongside of or perhaps in place of the go-router at some point in the future. Longer term, we can imagine putting Envoy's on Bosch Deployed Service VMs using that to extend the service mesh all the way out to services. You get sort of mutual TLS between your app instance and your service instance for free. That would be pretty great. You could imagine using the Envoy as an egress proxy. So for those of you who maybe are environments where people really want an application instance to have a static IP when it's connecting to some external service, if that application is a traffic we're flowing through an egress proxy, which could be dynamically configured to have that static IP, that would be pretty great and a good use case for Envoy. The other thing is that some of these more fancy features for reconfiguring the control plane, injecting policies like Angela was talking about, or other things into how you want to reconfigure this service mesh, those features are controlled via Istio. And we're thinking about how much to expose those features to users of Cloud Foundry versus how much the platform should just manage it for you. And we're really interested in getting feedback from the community on a lot of these questions. And so if you have thoughts about how much self-service should there be, how much configurability should there be, we would really like to hear your thoughts about that stuff. So just quickly, there's sort of two ways you can envision Istio fitting into Cloud Foundry. You can envision a shared Istio control plane for the whole installation, where multiple tenants, multiple Cloud Foundry orgs or spaces would all be controlled by the same control plane. This would require us to introduce some concepts of multi-tenancy into Istio, which we're planning to do. But the other model you can imagine is that every org or every space, maybe gets its own Istio control plane, and users have complete control over that, and they can manage it however they like. This is a little trickier to imagine how it would fit in with the shared routing tier. And for that reason, we're less inclined to go with this model. But again, if you have thoughts about this, we would like to hear them. And yeah, now we can sort of wrap up. Yeah, so I guess major takeaways. Sidecars are pretty cool. They give a lot of the features that your microservices would want, and it's transparent to the app developer. So you're getting all of it for free. Envoy is one such sidecar that we're currently integrating into CF application instances right now. And we're looking at how we can integrate Istio in order to manage these envoys to get even more functionality out of them. So we have some acknowledgments. This is definitely a large track of work that spans several teams and several communities. So we just wanna give shout-outs to the CF community, in particular, Diego Networking and Routing, and also the Envoy and Istio community for just being really responsive and receptive to questions or concerns we had. Also, if you think this is really cool, there's a lot of other sessions that we'd love to plug that you might be interested in. And with that, we really do want your feedback. We wanna hear what you think this would be really great for, what you might have concerns about. So please ask away, and you can find us in Cloud Foundry Slack on Sidecars. Thanks. I have a microphone and I'm gonna turn it on. There we go. Questions? I have many questions, but this time I'm gonna let other people start before I... Hi, Jack. Hello. How you can leverage the Sidecar pattern with Spring Boot application on Pivotal Platform. What do we need where we can start? Yeah, we're talking with the Spring folks quite a bit about how Spring fits into Envoy and Istio. I haven't heard any clear vision for how all that stuff works together. I think in some ways they substitute, in some ways they complement one another. If you have questions about that, that's a please join the Slack channel and ask them and we can pull some Spring folks in and try to find answers for you. In the North-South picture that you had, you had the GoRouter at the top and then Envoy was involved in there. Is the GoRouter included in the control plane when you mentioned that if the control plane goes away, potentially North-South traffic would not be interrupted? No, the GoRouter, you can think of it as a data path element. It's just a proxy, right? So traffic goes through the GoRouter to come in, but the GoRouter gets configured via some control plane which currently in Cloud Foundry today is the NAT's message bus. In the same way with like the Envoy is really a data plane element and it's configured by some other control plane like Istio. Does that answer your question? Yeah, is the goal to pull from Envoy's service discovery mechanisms to populate a routing table so that when you're using North-South traffic that you'll be able to... Not really, do you wanna take a look at this? So is the question like... It's kind of like the forward slide that you had, like how is Envoy gonna play into the picture? Oh, I see, going forward. Yeah, so the Envoy and Istio don't have their own state distribution system. They expect the platform to tell them where things are. So Cloud Foundry still has a role to play in telling those systems like you can find these backends at these IP addresses and ports. So that doesn't change. The only thing we're talking about going forward for the future is like the GoRouter is a layer four, layer seven proxy, or sorry, the GoRouter is layer seven proxy, the TCP router is layer four proxy. You can imagine those functions being served by an Envoy that's Bosch deployed instead of those things, but this is very long term like no promises we don't really know. Thanks. Yeah. I so badly wanna ask questions. I'm super excited. I don't know if you're excited for the answer, but I'm excited for the question. We'll see. So this is apparently or obviously an effort that spans multiple teams, right? You know representatives from two of those individual teams, but much more teams will be involved in bringing that or making that whole scenario happen. So who's actually, is there anyone, let's say, looking at or governing or I don't know what the correct verb is, this entire scenario? Like who's the PM for that scenario? Who's like taking care that this thing goes into the direction that users want? Yeah, I think that's something we're still trying to figure out as well. This definitely spans multiple teams and right now it's mostly just been coordination among the PMs of each of the teams, but there is a lot of talk about making sure that we don't duplicate work or what team is doing what, like for example, what team is going to like, Bosch package like an Envoy release or like what team is going to integrate Istia. So there's lots of chatter, but. This is very early days. Yeah, the short answer is no one's PMing this thing, there's no team for it. If you have things that maybe you think we're missing, join the Slack and tell us about it and if you feel strongly that this deserves like its own product lead or something like that, then that's great, you should share that too and I don't want to speak for the foundation or any of the product people, I'm just an engineer, but we're excited about this stuff and we think other people would be too. I have a question, sorry, for five years, the definition of cloud foundry by the behavior of all engineers has been it stops at the service broker and how services get implemented, and I mean backend services, not other apps and how their relationship and how routing to them works has sort of never been so, even the routing team is sort of when they came up with the new route, TCP routing stuff, it's like, no, no, it's still just for clients to talk to apps. I've wanted a sidecar, I didn't know it was called sidecars, it's good to have a name. I've wanted that inside app containers for many, many years so that services can dynamically update what's going on there for the security of it and everything, it was one bullet point and it sort of brushed off as a Bosch thing. To me, the most valuable thing from this is it can finally be the layer that goes out to the backend services and that whole relationship which we sort of have abandoned and abused from my perspective as an author of service brokers. Is it naturally going to evolve or is the nature of the team's working on this stuff which perhaps comes back to this question? I mean that it'll just be apps only, talking to other apps and it's all just about apps. So you see the talk, this is Istio and the open service broker API. You should go to that talk and ask those guys about that. No, I think- I'm happy for a segue, thank you very much. I think that makes a lot of sense and we should do that. The services aren't just going to be Bosch, right? We're going to have services on Kubo, we're going to have services in other things. Right, so the question is how do you define an API so that those services all speak the same language at the service mesh layer? Desperately, at the moment anything like that turned up I would rewrite, get rid of every routing thing I've ever implemented which is awful and go to something that was more native. Yeah, so I think that makes sense. We should go do that. I don't have the slightest clue how to start. But we're open for that. Desperately hope isn't the same time I'm talking. Yeah, I mean the other bit I think related to that is if you have Kubernetes and you have a Cloud Foundry or you have two different Cloud Foundries and different data centers and they have different Istio control planes, how do you get those things to know about each other and talk to each other and extend the service mesh across them? And swapping certificate authorities is the first step but then there's a bunch of other stuff you have to do too, right? So yes, that sounds great, we should do that. All right, thank you very much. Thank you. Thank you.