 Hey thanks everybody for joining us today. We're excited to get to talk about some of the uses of Istio beyond Kubernetes. You know kind of since the project came out it has kind of always been tied to Kubernetes at least in people's minds and but that's not actually the case in production right and what's more most of our production footprints are not only Kubernetes right and so there's a real need for taking the service mesh beyond just Kubernetes itself and today I'm really excited we have a phenomenal crew here to kind of talk through some of how we do that and how we've seen that hands-on. So hey to start with I'm Zach I was one of the founding engineers at Tetrate and I was one of the earliest engineers on Istio at Google. I'm Prati Manambiar I lead the teams that build and operate the mesh platform at Salesforce. And I'm Sven, I'm one of the founders of Istio and I've been pushing for us to support things beyond Kubernetes since day one. Awesome and so today I'm gonna start off and I'm gonna talk about some patterns that we've seen for taking Istio beyond Kubernetes. These actually come out of kind of two customers that I've worked with very closely to help achieve these deployments in production and then after that we'll dive into kind of two more excellent case studies from both Salesforce and Google that really are applying a very similar set of patterns here. So without further ado let's go ahead and dive in. And the first kind of pattern that we'll talk about or the first use case is this idea of split Kubernetes and VMs right and maybe this is all in one cloud provider you know so you're doing like EKS and EC2 or GKE and GCE or possibly your on-prem right and this is your vSphere footprint plus you know like openshift that you're running locally right but the particular customer that I worked with through this scenario of bridging EKS and EC2 environments kind of took this migrate then modernized strategy of hey let's get everything into cloud so we can shut down the data center and then we're going to work incrementally within our applications to modernize them piece by piece right and as they were looking at this problem set for them there were these four primary use cases that rose to the top right so number one was encryption and transit they're a financial services company they're heavily regulated because of things like PCI DSS all data must be encrypted in transit that was kind of the first key thing that they wanted as a corollary to that they additionally wanted consistent access policy based on those runtime identities those two kind of go hand in hand really beyond that they needed fine-grained load bouncing and I'll talk about that a little bit and then one of the other big use cases and this is one that I see all the time in in this idea of kind of bridging heterogeneous environments is controlling how traffic flows to for example the legacy monolith and the newer microservices that are being decomposed out of that monolith and so if we look here at this at this picture to start with you know maybe we have EKS and EC2 the one of the important kind of underpinnings here is that we have a common certificate authority and this will be a pretty common I think trend across all three of ours we need that common certificate to facilitate communication right and so that'll live somewhere kind of outside the mesh in general I would recommend strongly that you root your mesh's root of trust in your existing PKI and we're going to have an Istio deployed and it is going to be and it's going to manage kind of all the envoys nearby in this case you know maybe this is a single AZ for example and so we can those EC2 instances are close enough to that Istio that they're going to have roughly a shared bank we want to align our failure domains right in this year it's going to program everything like normal with the exception of some service discovery information we'll dig into a little bit here and so now what I want to kind of sort to pick through here is how a deployment like this starts to facilitate traffic flow and achieve kind of some of these four needs so first and foremost we should talk about you know the the easiest case which is how do we get traffic from existing applications into the new system into Kubernetes for instance and that's a bog standard to raise ingress use case and you know you'll go in through a gateway in that gateway can have you know an assigned static IP address a DNS name whatever it needs but much more interesting is then how we start to facilitate communication into the non-Kubernetes environments and so the first and easiest that I'll call out is this ability to just go in through a front door Envoy and this is super easy to facilitate that Envoy can have you know a static DNS name a static IP address for example so you don't necessarily need to execute service discovery on it but you do get some of the nice kind of control that Envoy gives as traffic flows into these VMs and in particular one of the big things that this lets us do is split traffic as we're for example decomposing that application that's on a VM and as we split that traffic you know that can go to different places that can go maybe to a different cluster maybe that can go all the way to a different site or a different region and this is one of the things that we'll kind of talk about as we circle back at the end but ICO gives us this this set of tools for solving problems and we can reuse them to solve many similar problems so controlling traffic split across for example a VM deployment and Kubernetes deployment looks very similar to controlling traffic across for example sites as failing over for for disaster recovery and so this is this is one way hey you know we can go in through these front Envoy's but there's other ways that we can start to wire this up and how you ultimately choose to use the tools that ICO provides is going to depend on your site and so for this particular customer that I worked with they actually had an existing legacy service discovery system it was based on on zookeeper I think actually we'll hear about Salesforce and then they have a very similar system it was in vogue at the time that both of those companies were built and their applications automatic on VMs automatically registered with that service discovery system for them it was important that we didn't have these extra network hops because the overall latency of the of the transaction was pretty important and so they wanted to facilitate direct VM communication and we were able to do that in a pretty straightforward way by taking their existing kind of zookeeper service discovery records and translating those into services into service entries for Istio in a pretty natural way and I'll point out you know we had to do this kind of bespoke in their sites because they had a homegrown service discovery system as you're starting to use like different cloud providers that are that are starting to provide different service discovery mechanisms changes are pretty decent that there are already some integrations around right so for example ec2 populates a cloud map registry for service discovery and and Istio there are plugins for example to to push that data into Istio so this isn't necessarily something you have to build yourself but this is a totally valid way to set things up and and when that we see a lot we have folks running in production doing you know direct pod to VM connectivity in this in this way and then of course for for VMs that we might actually enroll into the mesh and when I say enroll into the mesh what I mean is deploy a sidecar there and that's another important idea I want people to understand you don't necessarily have to have a sidecar deployed there it'll deploy change your security model but you can get some benefits from just having kind of this this doorway Istio ingress gateway as well but when we do have an envoy on that VM we can use some of Istio's new auto registration capabilities I think some of them we'll talk about that later to know exactly where that that VM is where it lives in Istio and enable that direct communication without having to do that kind of lifecycle events off to the side and so you know looking at these we can there's you know some easy ways that we can facilitate traffic flow we might pick different ones based on our requirements and they give us these other capabilities for example fine grain load balancing in this particular customer site fine grain load balancing let them drop their Kafka footprint by to one sixth of its original size because they weren't doing this at connection level so there's some big benefits there that you can get putting into the envoys for the for the load balancing and then facilitating some of this connectivity and now I kind of want to take a step back to one higher level which is what if we need to start to do similar things across sites right and and the particular customer that I have in mind here square has to do this they have legacy data centers where they run applications and they run those in a mix of VMs and and Kubernetes as well and they have a cloud footprint too and their first two requirements were identically the same I need to encourage them to train that I need consistent access policy again their financial services company it's it's incredibly important but then they and and they had that same Kubernetes to VM within a single site right that right side picture is is identical to before however they had the additional requirement that they needed to be able to fail across sites for example for disaster recovery as well as the need to burst into cloud for additional compute capacity for them they have very you know time of day driven traffic patterns for example and the need to and and so you know for for certain events like market open market close those kinds of things there's a lot more traffic and so the ability to burst into and not run over provisioned all the time is is absolutely massive and if we look you know I'll take kind of that that first example we have you know we have or I'll take one of the examples I dove into before where we have traffic coming in and we want to control how that flows across for example VM and Kubernetes cluster by the same token we can control how we flow across sites and if we squint and look you know these two patterns are the same they're they're doing the same things they're they're achieving very similar goals as well and so I want to kind of highlight that is to give us a set of tools around how we can load balance and when I say the same let me let me dig into that for a minute as an application running on this this in the mesh I get to communicate with my dependencies using a name and really not dealing with anything else and the message is going to deliver that and so when I say this is the same how the application perceives or connects with other applications is identical throughout and so you know and and we're doing similar functionality at several places even this is an important idea because again isio gives us this set of tools that we use for solving the problems and so it's important that we internalize kind of what those fundamental tools are because we can use them and so in this case having that ingress gateway that can that can load balance across multiple clusters is a incredibly powerful tool um but there is one other large problem that I kind of skipped over earlier that that is introduced now that we have multiple istio d's here which is we need to start to coordinate our configuration in the system and synchronize that and that's a that's a problem that isio itself does not solve natively today there are some different deployment topologies you can use with less istio d's here for example in things like that but regardless of which deployment topology you pick you are going to have to solve how you start to synchronize configuration across different clusters and there's a whole bunch of different options there I think spin will talk about some at linkedin or in a little bit more depth in his section but the general rule is you know use the sd plus your your existing cd system to start to facilitate that and with that you know and and so these are kind of two concrete customer patterns that that I have personally seen firsthand uh to to start to facilitate communication across these disparate environments um and with that I think we'll will now kind of dig into uh some other examples that they use that are that are very very similar I think thank you Zach um if you can go on to the next slide yeah sales force um service spans multiple well service spans services that run on diverse infrastructure for example in our first party data centers we support services that run on bare metal communicating with services running on Kubernetes our monolith runs on bare metal in first party in our public cloud deployments we have services running on VMs and Kubernetes talking to each other via mesh um about four years ago when we built our mesh platform using on boy and our in-house control plane we had to support these services running on these diverse infrastructure and therefore when we wanted to adopt a open source product as our control plane a couple of years ago um the minimum viable solution had to support bare metal VMs and Kubernetes so we chose Istio since it is a featured rich control plane and that it meets our growing requirements and it is solving the problems that we are trying to solve and therefore is a good fit for sales force um let's take a closer look at what our mesh platform with Istio as our control plane looks like um we run Istio D on Kubernetes which is the control plane um and Kubernetes services inject sidecar and communicate with the control plane for config updates and policy updates we have a config webhook that generates some Istio config for our services um and um sales force requires us to use an internal ca for shock lift certificates so we configure both our control plane and our sidecar to use these certificates um generated by our internal ca for mtls um we run our sidecar next to um our monolith on bare metal and it communicates with the control plane via a l4 load balancer the monolith announces to zookeeper so zookeeper is our service registry for non-cubinities workloads and then we have a synchronization service that synchronizes these announcements in zookeeper and updates service entry objects in cubinities um the service entry object is similar to a cubinities service object and it is used to represent a service um that can participate in the mesh let's take a closer look at how to onboard a service that's running on bare metal onto Istio's based mesh um we manage the life cycle of the Istio proxy sidecar um by the monolith so the monolith does that for us um the Istio proxy is as i've mentioned before configured to use certificates that are delivered by that same internal ca and the service routes traffic to a special ip to participate in the mesh and we have a dns a wildcard dns that results to the special ip and it is used to reference all mesh services um in in the mesh um we also configure a sidecar resource it is a custom crd that's available um that's provided to us by Istio and we configure ingress and egress listeners at that special ip um via that sidecar resource and then as i mentioned before the monolith announces to zookeeper and that gets synchronized to cubinities as service entry objects and that is how um our bare metal service is able to participate in the mesh as if it were running on cubinities um the next slide yeah we are in the process of rolling out a new feature that was introduced in um Istio 1.8 and this is the auto registration feature um adopting this feature will allow us to get rid of that zookeeper that we use for service registry for non-cubinities workloads um the auto registration feature of Istio supports a set of crd's to represent non-cubinities workloads to enable them to participate in the mesh um for example there is a um workload group um crd that enables you to specify the properties of a workload for bootstrapping it and as a template for workload entries it's kind of similar to how you um you can use a deployment object in cubinities to define the properties of workloads uh via a pod spec um a workload entry um crd represents a single workload similar to a pod in cubinities and and i as i mentioned before the service entry is similar to a cubinities service object for non-cubinities workloads so in our deployments we create the workload group um with the workload entry template um we pre-create it and the um Istio proxy that runs next to our monolith on bare metal or a service on bm um connects to the control plane and the control plane auto registers and creates that workload entry um object that we talked about um similar to how a pod is created in cubinities um and then you can choose to create pre-create the service entry object um but we create that service entry object via a config uh webhook which listens for those workload entries so we don't have to pre-create it um and that is kind of how we um hope to adopt the we're in the process of adopting the auto registration feature so that we can get rid of that zookeeper um that we run today everything i said so far also holds good for vms and vm for vm-based services we deploy the proxy using our p.m um Istio 1.8 also supports a dns proxy feature um at the sidecar so adopting that feature will allow us to remove that special ip and dns that i talked about earlier to refer to mesh services the sidecar can be used to resolve dns entries of any mesh service for example a non-cubinities workload can refer to a cubinities workload with the cubinities dns name we use this feature today to resolve services running on different um cubinities clusters as part of our multi-cluster mesh support and we will be using that for non-cubinities workloads as well i hope that gives you a feel for how Istio is being used to support services running on diverse infrastructure at Salesforce i will now hand it off to swen to talk about how google uses Istio beyond cubinities all right thank you uh Pratima and thanks Zach for the the introduction there um so i'm gonna just really briefly touch on actually a lot of the same kind of stuff um you'll see a lot of similar patterns here um just like just like Salesforce google is running uh services in on-prem data centers as well as in cloud um the the group i'm going to talk about here really is google's internal corporate um engineering team that builds and runs a lot of our internal services um i think one of the the fun ones that actually they were starting with here is a system that provides menus of the cafes it's not been very useful the past year but that was a great app way before all this pandemic stuff happened um so we run applications uh these corporate applications both on cloud and on-prem um and in both cases we're using the service mesh for kind of all the stuff that everyone uses service mesh for right so the micro segmentation of the application layer so you don't need to do network firewalls operations management uh the encryption requirements you know that Zach was talking about for financials uh google has the same sort of encryption requirements on everything um and on just making releases easier to roll out and say right using the canary support so those are kind of the main use cases um so within those let's look at uh the the applications running on google cloud platform um so this is a mix actually of internally written applications and also uh vendor applications provided by um vendors that google works with an interesting thing there is that because of the way Istio works you can actually just take those vendor applications and run Istio on them without having you know having to have the source or anything and that's actually a huge benefit um so you can you can get all these controls without having the source um and again this is a mix of of VMs and containers and serverless um we kind of use everything and we want them all to be able to talk to each other and to talk to the services running either on-prem or actually in our production environment um so we sort of need everything to be able to talk to everything and and Istio is a big help there so for our on-prem environment um we have basically the same stuff actually so we have you know this mix of vendor applications and custom built internal applications um here actually it's mostly VMs for on-prem there's not yet a lot of uh container usage we are starting to experiment with uh google's anthos product um to provide on-prem kubernetes and then get um containers there but right now it's pretty much just VMs and uh for these on-prem data centers we actually connect both to google cloud and to the production services kind of through the front doors of those services so there's no there's no back doors here everything goes in as if it was you know anyone else all right um so let's uh let's take a look at what this actually looks like again this is kind of at a at a high level here um so Zach was talking about the the configuration distribution problem um so we actually already have a whole system in place for this that is set up to distribute sort of a lot of that lower level networking um configuration things like the the network ACLs and firewalls and other policies on projects and things like that um we're just reusing that system and you know adding a plug-in to support sending out the um the Istio policies so that is how we distribute all of our policies to um to all the api servers and that actually provides a sort of paper trail if you will from source control all the way to the end state so no user is directly modifying anything in an api server right there submitting things to our internal repository those changes are then vetted and then rolled out um carefully to the entire fleet and it makes it a lot safer and you know you have your audit trail and all that kind of stuff um we we're actually running Istio here as a separate external service using the support um that was added for external Istio D forget if that was in one seven or one eight but um it's available now and actually we're taking advantage of it um this lets you run an Istio D that is not actually in any cluster you can run it uh you know however you want you can run it on vm you can run it in a separate cluster that you manage um you can run it in serverless ways right there's lots of opportunity to manage it without having you know the people that are running applications having to worry about this dd and and care about it and feed it um so we run that and that is hooked up to read from the various api servers um in the in the mesh uh we are using auto registration that pertina top talked about um so the vm's they're running sidecars um those sidecars auto register we automatically create the workload groups um for those based on the the um configuration stored in the policy manager and so you know the whole system just kind of works and everything can talk to each other um the vm vm envoys and the pod envoys all just connect and manage all the way they get their configuration and you know everything just works so it's great um so that's that's a quick rundown of of how uh google is using Istio internally well awesome thank y'all uh so just to kind of to come back and summarize right if if we look through and then look at like some of the high level topologies here that we kind of talk through across these four use cases and we squint you know fundamentally we're really solving the same set of problems here right this this cross cluster cross site connectivity and Istio gives us this really powerful set of tools we saw across these disparate organizations we weren't actually working particularly closely together any of the three of us around the development of these separate systems right uh but all of us converged at at architectures that are that are you know very similar right and and even some of the lower level tradeoffs for example that predima went over uh and and how you know envoys connect and those things are exactly things that that we've seen firsthand that i've seen firsthand you know so and i think you've seen some more things as well and you know and that for you know as a group of folks that are also helping build these tools that's incredible that's awesome and exciting because it says hey our tools are actually solving some problems um and and you know their Istio today gives us this powerful vocabulary gives us these powerful set of tools for for solving these these problems um you know there's maybe just a little bit of roughness around wiring it up at the edges and there's and there's trade-offs that you're going to want to make in the context of your particular deployment and the security posture that that you need to maintain and and all of those things there's no one size fits all uh but the the primitives are there uh and they're powerful they're robust and they're in they're tested to be able to go do this yourself and with that uh thanks everybody for coming any any closing comments uh predima spoon the the y'all want to make all right i'll echo what you said zack that the like seeing other people use this stuff just you know makes me so so incredibly happy and proud right of just uh and especially like you know i i worked i worked a lot on figuring out uh workload group and workload entry and auto registration and like making that a work and it's so it's so exciting to see someone say yeah you know we're starting to adopt it and it's going to solve problems for us right because that's what we're we're here to help solve problems so super exciting um i think you guys summarized it well i and um we do feel like we've made the right choice as we adopted um is to use our control plane for sales force um yeah that's all i have to say all right awesome thank you all we look forward to questions uh everybody have a good day