 Hello, can you hear me? Test, test. Sounds all right. All right. Thank you all for coming today. Hello, hello. Thank you all for coming today. My name is Aaron Hurley, and I'm an engineer on the Cloud Foundry routing team. Today I'm here to talk to you about how the routing team has been exploring bringing Istio and Envoy into Cloud Foundry, in particular, for ingress routing needs. So first, a bit of housekeeping. Please note the locations of the surrounding emergency exits and locate the nearest let exit sign to you. In the event of a fire alarm or other emergency, please calmly exit to the public concourse area. Emergency exit stairwells leading to the outside of this facility are located along the public concourse. For your safety and emergency, please follow the directions of the public safety staff. All right. Again, quick disclaimer. This talk focuses on north-south traffic, not as much as east-west. So what does this mean? This is a networking term that folks use, but it's essentially the traffic traversing from the public internet into a private data center or cloud. So here's the example coming in through the load balancer, hitting a router, and then going to an internal service or application. If an app wanted to talk to another app, it would have to hairpin back out. Whereas east-west traffic is where apps or internal services would be able to talk directly to each other without needing to leave the data center or cloud. So let's imagine a scenario. Say you're an application developer, so maybe not much of a stretch for many of you. But you currently have a V1 application that's very stable, does its trick, but you've been working hard on what V2 looks like. And you're pretty excited about these features that you want to get in front of your users. You're so excited, and you're so confident. You've test-driven it. You've tested it as much as you can. So let's go ahead and push that. And I'm sure we can all kind of foresee a problem of just updating your current application of flight. If it catches on fire, which I'm sure it's bound to do, all your traffic disappears, and that application has some downtime. So let's instead figure out a better deployment process here. So instead of removing V1, let's just deploy V2 alongside this. And by using the same route, we're able to map 50% of the traffic to each of these instances. This way, when the inevitable bug happens, and V2 catches fire, only 50% of your traffic is affected. That's still not so great. But we're clever engineers, right? So why don't we just scale up V1 and throw the odds in our favor? But that's super hacky and very wasteful. So what if instead we had a bit more fine grain control? And when we just decided to push V2, we're able to select the amount of traffic that we want to send there. So if we said, hey, let's just send 10% there and do a kind of a more standard canary rollout. This way, if there are issues, we can quickly switch back, send all the traffic back to V1, and then go debug why V2 had some issues. This is an example of weighted routing. And unfortunately, is not currently available in Cloud Foundry. But today, I'm hoping to paint a picture of how Istio and Envoy could bring this into Cloud Foundry. So first, what is it that users want in a routing system? Users want security. This is an absolute must. Data and transport must be encrypted and mutual authentication everywhere. Users want reliability. This is highly available routes, configurable timeouts, transparent retries, and load balancing strategies. Users want intelligent routing features, such as weighted routing or circuit breaking. And users want telemetry. They want to be able to collect detailed metrics on the particular apps and services that they care about. And if they notice that something is wrong, they want the ability to trace through the system and determine where that breakage is. So what's wrong today? Why are we even considering other options? Why not just continue to build off of the products that we already have? Well, the obvious thing, a problem that we have right now is we're missing a lot of features. Can we fix that? Sure, it's a very simple solution, right? Just implement more features. Unfortunately, this touches on another problem. Many times to bring in these big features, the work will span many teams. And inside the GoRouter involves some legacy code. This is a pretty difficult combination which results in low velocity, low turnaround time for features. We have done some initial explorations to determine some features that are desired, like HTTP2 and the effort that would take to get into the GoRouter. And unfortunately, it would involve some pretty massive refactors. So again, ways that we can fix that problem is massive refactors starting fresh, which, of course, has its own problems. Someone else would probably be giving this talk in another two years, then, or throw more engineers at it, which I would love to do, but I don't have that power. Another issue today in the current Cloud Foundry routing product offering is that we have multiple products. We have the GoRouter and the TCP router. And the unfortunate thing is it's more of an either or. You don't get the same set of features in both of them. So when a user comes and asks, I really want feature X, and I see that's in GoRouter, but my needs are really for the TCP router. Can we make that happen? And it's usually, unfortunately, no, it's pretty tough. It's right now, like I said, it's either or. And if we were able to fix this, we'd just be able to simplify this offering and provide a single solution that is just more configurable. So guess what? Istio can solve these. So Istio and Envoy are already very feature rich. You can look at their DAX and it's just a laundry list of all the things that they support. And we have determined, just through some initial exploration, that the routing team within Cloud Foundry could enable features quicker if we are able to focus on the integration of Istio and Envoy into the routing tier instead of trying to continue to evolve off of our existing products. In addition to this, Istio and Envoy both have large open source communities. And we're hoping that we can piggyback off of the velocity of a much larger team since we are only a handful of engineers. And then furthermore, we believe that these projects will continue to survive and sustain as they're being backed by companies that have successful track records with maintaining open source projects, such as Google and IBM. And this additional benefit here is that it gives us a chance to rethink our system architecture and clean up a little bit of how the routing control plane works in Cloud Foundry today. And I'll touch more upon this in a later slide. And then lastly, this is a single offering. We can point users to this one thing and they can configure it as they like. So now that I've mentioned some things, let's talk about what they are. So what is Envoy? Envoy is written in C++. And it's a lightweight Layer 4, Layer 7 proxy. It was open sourced by Lyft about a year and a half ago. And then after that, contributed to the Cloud Native Computing Foundation. Envoy operates primarily in two modes, Sidecar and Ingress. Sidecar is what's commonly deployed alongside a service within a container. And this is when these Envoy Sidecars talk to each other, it builds what is known as the service mesh. For routing purposes, we're focusing on the Ingress model. And so this is what brings public traffic into the cluster. Envoy is dynamically configured through various discovery service APIs. We'll talk about these on the next slide, but collectively they're referred to as the XDS APIs. And finally, the version two of these Envoy APIs, which was just released last month, they operate now in a pushed-based configuration model so that this provides a lower latency for updates and additionally less resource constraints. So Envoy, how does it work? So to put it simply, Envoy sits on the data path and takes in connections and tells them where to go. So if we break this configuration model down a little bit, Envoy consists of a few pieces. So the first is listeners. Listeners tells Envoy how it listens. These are things such as which port am I bound to? Which protocol should I be using? And then listeners are updated by the listener discovery service, the LDS. Listeners are mapped to routes. Routes tell Envoy where the traffic is sent. So this can be some sort of matcher, such as what is the host header of this request. And routes are updated by the route discovery service. Routes then point to a cluster. Clusters tell Envoy how to send that traffic. So this is where a bulk of the routing rules, such as weighted routing, would come into play. This tells you whether or not to use TLS or how your load balancing strategy would be. And this is updated by the cluster discovery service. And then clusters are really a group of endpoints. These are hosts that are able to receive this traffic. And I'm sure as you've guessed it by now, this is updated by the endpoint discovery service. So here's a more concrete model of what this all looks like. So you'll see that on the left side here that you have these blue boxes. Each one of these is a listener. So you can configure Envoy to have many listeners. And you can see they vary in protocol and port number. And on these listeners, you also have your routes. Your routes are what will match, depending again on your header information, and point to a cluster. These clusters have the various routing configurations there and also contain the group of endpoints. And these endpoints are simply upstream IPs that are able to handle the traffic. So again, on the right, these XDS APIs is really the bulk of telling Envoy how and where to send traffic to. And these XDS APIs derive from Istio. So Istio, it's a platform for connecting, managing, and securing microservices. I like to think of it as an Envoy control plane. That's a little more concrete for me. It's written in Golang, open sourced by, with large efforts by Google and IBM. And currently is on version 0.7. They're planning a 1.0 mid-summer, so that'll be the first more long-term support release. All right, so let's chat about how Istio works. So you can see both use cases of Envoy in this picture here. And then the bottom left is the gateway. So this is where you would, sorry, gateway is an Istio configuration model for an ingress edge proxy. So you'll see here that this gateway Envoy will listen for incoming traffic, and then it will determine, again, through its configuration, which IP, which endpoint in cluster that should get sent to. These Envoy's on here are just the sidecar Envoy's. So Istio primarily is made up of three components. The first is Mixer. Mixer is really the policy engine behind Istio. It's responsible for authorization, as well as telemetry. So when a request comes into an Envoy, some of these attributes are then sent to Mixer, where Mixer will aggregate that data for metrics and tracing, and then also determine whether or not it can, whether or not this request has the authorization to continue forward. The next piece is Istio auth. This is really what's responsible for managing certs within the platform, and this happens asynchronously that auth can just update the Envoy's, and then really the part that the routing team has been interested in is Pilot. And so Pilot is responsible for updating the configuration for the Envoy's, and this also happens just asynchronously. So let's dive into Pilot a little more. So at the heart of Pilot is the abstraction model, and really what happens here is Pilot will take in data from the rules API in the platform adapter, abstract that into Istio's models, and then provide that into the Envoy API so that Envoy understands that. So to step through this a little bit, you have these traffic management rules, and these can be provided by a user on-demand as needed, but this again comes into the abstraction model and ties it to Istio. Then you have the platform adapter. So you can see that this is platform agnostic. It should be able to be used with various platforms. The one we care about here, or me at least, is Cloud Foundry, and the routing team has contributed the Cloud Foundry adapter within Pilot so that it is able to talk to a component that we created called Co-Pilot, and Co-Pilot is responsible for providing the routes from various Cloud Foundry components. Then ultimately this is served through the Envoy API and pushed out to the Envoy's to update them. All right, so let's talk about how the routing, how like GoRouter and TCP routes get their routes in today's world. So currently in CF, when you do a CF push, Cloud Controller will tell Diego about its desired LRPs. Along with this is this little blob of metadata that Diego doesn't totally care about, but we'll see why it's important. Diego then does its thing. This is obviously simplified for routing purposes, but the important thing here is that Diego knows about the actual state of the world, and this is what the routing emitter then sends to NATS to provide these routes to GoRouter. So for those that aren't familiar with NATS, NATS is a lossy message bus, and in the current world, although Diego has a consistent view of the world, the fact that we have to send it through NATS, which is lossy to GoRouter is not the most ideal situation. Just to paint a more complete picture, here you can see that RouteRegistrar also updates its routes with NATS, and RouteRegistrar is commonly deployed with any BOS deployed VMs, and then also to bring in the TCP router for how TCP routes come in. TCP routes come through the routing API, which also learns its routes from the routing emitter, but you can kind of see that in the end, the picture is a little messy and it feels like we could clean this up a little bit. So let's instead think about what the future could hold. So the first thing I wanna do is let's talk about this routing metadata. Diego as the orchestration layer really doesn't care about this routing piece. It cares about putting its workloads in particular places, so let's take that out, and with that we can remove the routing emitter and we'll just drop this cell picture all together. So starting from Envoy and working backwards, we know that Envoy gets its configuration from Pilot through the XDS APIs. Pilot has the Cloud Foundry adapter, which talks to Copilot, so Copilot sends its routes to Pilot, and then Copilot gets data from Cappy, which knows about the desired world, and then Diego, which knows about the actual world, and joins this data to provide the routes so that Pilot understands where these services live within Cloud Foundry. And again, to paint a more complete picture, there's the router registrar deployed with Bashaploy VMs, and a couple of points here to notice is Diego doesn't know about routes now, and again, as I mentioned, this is something that Diego doesn't care about and is a bit of a way that is an ideal right now, but we have allowed this cleanup. We have a much simpler routing tier. There is just one router that you have to worry about, so no longer you have to make that either or choice. We remove the routing API. That's one less durable data store you have to care about. And then we remove NATs. So this is a big win in its own, and we would love to see it go. All right, so I just want to chat about a couple of the challenges that we've encountered while working with Istio. So Istio is a very young project, and it moves quickly. Just for some numbers, Istio I think is about a year old now, and the product as a whole has more than 4,200 commits and 170 plus contributors. So things here move very quickly. This gives us an opportunity to really make the testing strategies a bit more robust, as we have contributed some other testing strategies within their community to consider so that we make sure that Cloud Foundry continues to be considered while changes are made. With that said, Istio is built with Kubernetes top of mind, with Google being the primary leader of this project, Kubernetes abstractions have made their way into the code base. Again, we view this as an opportunity for Cloud Foundry to get more involved and help evolve this tool to become more platform agnostic. So through various things like PRs and working group discussions, we've voiced these concerns and contributed where we can. And a large piece that we're working on is how do we expose Istio functionality within Cloud Foundry? So with Istio comes a lot of configurable options. And currently in Cloud Foundry, we optimize on user experience and have our own opinions of how things should be done. So how we do this is still to be determined, but we are actively running user experience design, user experience researches, researching, and if you have any input, we would love to hear from you. So where are we right now? Co-pilot, early iterations of co-pilot are currently example in Istio release, which is a Bosch release that the routing team has put together. This consists of, when deployed, the generic configuration is two Istio routers, which are the ingress gateway envoys, and then an Istio control VM, which consists of co-pilot and pilot. This is under heavy development, so use at your own risk. If you have questions about it, or if you're using it and it breaks, we probably can't provide the best support right now as we're just continuing to iterate as much as we can on it. Basic HTTP routing functionality is currently in flight. We recently finished up some work with Kappy and Diego, which allows us to have a bulk synchronization so that all routes and route mappings can continue to be updated in not just event-based, but also over a eventually consistent world. And there are still many, many features that we want to cover, so I'm sure you have your own list of things you'd like to see. So just before I jump into some questions, let's, I guess, cover some questions that you might have. So when will this be available? As I mentioned, we do have the Istio release repo already out there in its public, but it's hard to give an exact timeline just due to the agile manner that we work in. So really the best way to monitor when this will become available is by watching backlog or just keeping an eye out for various announcements, but thinking it would be ready for production use anytime soon might be risky as Istio is still yet to hit 1.0. Will the go-rotter and TCP-rotter be deprecated? Again, we would likely approach this in a scenario similar to how DEAs were deprecated if this was to occur. So we would try to reach full feature parity with what currently exists in Cloud Foundry before even creating a plan to deprecate this, so I don't think you have to worry about any of these products disappearing anytime soon. Now, what about east-west traffic? I know everyone loves the service mesh, like let's talk more about that. It is super exciting and the container networking team is actually working on a service discovery controller, which we'll be able to talk to co-pilot, so best bet on following that sort of news is following their backlog or chatting with them. And then what about insert your favorite feature here? When will that be available? Again, observing our backlog is a source of truth, but if you have feedback or you have use cases of why particular features are more important than others, we want to know. So I just wanted to give some acknowledgments to the various communities that have helped to make this effort possible so far. So within CF, we've done a lot of work with Cappy, Diego networking, and of course routing. Within the Istio community, special shout-outs to Shriram and Zach Butcher who answer our endless questions, but the community as a whole has been very supportive and responsive to our requests. And then also the Envoy community. We haven't had to contribute too much as a actual code yet, but we've been just jumping into your conversations and getting more familiarized there. Here's a list of resources that can be useful that I'll, for when I publish these slides. And then finally, I just wanted to thank you all for attending. Again, we really want feedback from you and we want to know what you want to have and also how that experience should work. So we have routing office hours, tomorrow at 2.45. You can always jump in the routing channel and CF Slack, we're in there. And if you have any feedback from me specifically, there's my email. And that's all I got. Cool, so we have time for questions if anyone has any. Sure, so the question is when the routing team wants to contribute a feature to an open source project specifically, I guess, Istio, like how does the standard cloud foundry pairing methodology I guess kind of work? And this has definitely been an interesting process for us. It's a bit of uncharted territory for our team. So the Istio community as a whole works off of a PR model. And so things move relatively quickly. So we always need to keep eyes on our PRs that are in-flight. So we have been experimenting with kind of keeping either a pair or a smaller bit of the team being more sticky on our Istio track on the Istio piece of the backlog so that we continue to monitor these things. But it is certainly something that doesn't just flow as easily as pulling off the backlog like general work does. So it's a challenge and a process that we're continuing to evolve. Sure, so the question is in the new model that I had shown was a go-router there. And specifically the picture I showed, it was not, but you can kind of think of deploying the two in parallel. So this is like a super simple diagram, but you can kind of see that they're in parallel, they're really just fronted by different low balancers. So you could configure that in some manner. And this, we actually do this like in our testing environment, our acceptance environment. But as for what the long-term future is, they should be able to be in parallel and then it would be up to you as to which you want to use. Yeah, so ultimately what's in front of our edge proxies is up to you. In this picture, yes, we have a load balancer that will collect all the traffic through DNS. Not just. Yep, yeah, so the apps themselves are all staying the same. It's really just like the piece within Pilot that we worked on with the Cloud Foundry adapter. That's what transforms what's existing, all of the existing apps within Cloud Foundry. And that's what translates like that service registry to Pilot, which Anvoy then knows about. Yeah, I certainly have thoughts on that. That's unfortunately on the scope of this talk and I'd love to chat about it anyways. Or chatting with the container networking team, you can discuss how the East-West traffic is planning to be implemented. But yeah, I'm gonna table that for right now. What's the best way to find your backlog? Uh, Google? Yeah, I don't, I can drop a link somewhere, but I don't know, if you look at a GitHub issue on one of our products, they automatically create a story and you can click through one of those and find your backlog. Roundabout approach, but I don't have a quick answer. Yeah, we'll set up a better link for that. All right, cool. I think I'm out of time, but feel free to pass on me with any more questions. Thank you.